There is currently a manifest trend in the scientific Python ecosystem: Python is slowly but surely coming to the browser. It's a real challenge, but we're getting there. In this post, I want to give an overview of where we are, and where we're headed.
Why it's a good thing
Python is becoming one of the most popular open source platforms for scientific computing and data analysis. The language is easy-to-use, expressive, open to the rest of the universe, and the scientific ecosystem is quite solid.
On the other hand, the Web is today the platform of choice for client-side applications. What we called Web 2.0 some time ago is now just the Web. Social networks have pervaded our lives and sharing things on the Web with our acquaintances is now entirely natural. The Web is also becoming a solid mobile platform, maybe not quite as powerful as native platforms yet, but it should eventually get there (at least that's my hope).
It doesn't have to be that way. Technical difficulties can be overcome. We're not there yet, but we're getting there. Here's why.
The signs it's coming
It's like different pieces of a puzzle popping here and there, suggesting a barely perceptible convergence. From time to time, Python timidly tempts to approach the Web platform.
The IPython notebook
The most dazzling attempt so far has decidedly been the IPython notebook. This impressively architectured piece of software has been particularly well received by the community. The reason is that it was just the exactly right answer to a desperate need. The fact that it's a browser-based technology is almost a detail, and yet that is precisely my point. Many people now live in the browser, so bringing Python there just seems absolutely right.
To me, the IPython notebook is the first revolution in the convergence between Python and the Web platform.
Web-based visualization libraries in Python
Another trend concerns interactive data visualization technologies that come to the browser.
Libraries like Bokeh, or online services like plot.ly, allow people to design figures in Python in order to obtain Web-based visualizations. The rationale behind these ideas is that scientists are no longer satisfied with static publication-ready plots: they want interactive plots. There really can be scientific information in interactivity (think about linked brushing for instance).
A widely popular interactive visualization library in the Web community is d3.js. It's no surprise that several Python libraries try to target it as a backend. There's Vincent that lets you design a visualization in Python, export it to Vega (a visualization grammar), and finally generate a d3.js visualization from there.
And there's mpld3, a wonderful attempt to bring matplotlib and d3.js together. The idea is conceptually very simple: export a matplotlib figure to JSON, and generate a d3.js visualization from there. The end-result is stunning: your matplotlib figures become intrinsically interactive. You no longer need a live Python server to pan and zoom in your figure: it's just there, in your browser.
Finally, I want to mention Vispy, a project I'm currently involved in. The idea is to leverage the power of the graphics card (through OpenGL) for fast high-performance interactive visualization of potentially huge datasets. Although it is a Python project, we think hard about ways to bring Vispy to the browser. The most promising (and challenging) approach is quite similar to mpld3: export a visualization in JSON, and render it in the browser with WebGL.
In the end, it seems like data visualization is today a hot topic that concentrates a large part of the various efforts to bring scientific Python to the browser.
I've given the arguments from the Python side. Here are those from the browser side.
There's no reason why scientific applications should not benefit from this platform. The eventuality of joining Python and the Web could enable the creation of rich scientific applications, with ergonomic and portable HTML-based user interfaces backed by the powerful scientific Python ecosystem. Those applications would run in the browser, so deployement would become trivial compared to installing a Python distribution. Besides, they could run directly on tablets and smartphones. I think there would be a huge demand for this.
Very recently, Intel, Mozilla and Google announced that SIMD instructions where coming to Chrome and Firefox, and to emscripten as well.
WebCL is still a draft, but it should eventually bring OpenCL to the browser. General purpose massively parallel computing on multicore CPUs and GPUs in the (desktop or mobile) browser is becoming a reality.
Finally, WebGL will interest big data visualization libraries: this technology gives direct access to OpenGL ES from the browser. WebGL support on mobile devices is still scarce, but it should hopefully improve in the future.
So we see that the pieces of the puzzle are being put together to enable high-performance computing and visualization in the browser. We are close to get everything we need to bring scientific Python to the browser. The last challenge is the language barrier itself.
How it's coming
So, how could scientific Python come to the browser? Here are a few ideas. Some of them may not be reasonable. I have probably omitted interesting alternative ideas. And I've no idea about which approach could effectively succeed. I hope people more clever than me will eventually find out.
Python in the Cloud
The "easiest" solution is probably a cloud approach. Run Python in the cloud, and make it accessible from browser-based applications. I think this is one of the most reasonable approaches in the short and medium term. Many services already offer this technology: Wakari, PiCloud, PythonAnywhere, StarCluster, and others. The main drawbacks are the following:
- A live Internet connection is required.
- You might get high latency depending on your connection, which can be detrimental to real-time interactivity.
- Those services may sometimes be expensive (you need a cloud infrastructure!).
- You don't really "own" your code and your data.
The cloud approach is probably the best for heavy workflows, highly intensive computations, and huge amounts of data. For reasonably light computations, or small to medium amounts of data, purely offline self-contained solutions may be interesting alternatives.
A Python interpreter in the browser
Although impressive, these projects generally do not guarantee 100% of the language syntax nor 100% modules in the Python standard library. And they don't have support for scientific Python modules like NumPy, SciPy, etc. The main challenge with those libraries is that they include C and even FORTRAN code.
A Python JIT compiler in the browser
NumPy in the browser
All the solutions mentionned above do not tackle a major challenge: how to bring NumPy and the rest of the scientific Python stack to the browser?
A large part of those libraries is written in C or FORTRAN, and make API calls to CPython. Bringing these entire projects as they are now to the browser seems extremely hard to me. Emscripten and asm.js might help, but to what extent I'm not sure.
An alternative would be to rewrite from scratch part of the core functionality of NumPy. The major missing piece is the multidimensional array data structure: the
ndarray, and vectorized computations. Mikola Lysenko has a few projects in this spirit:
cwise, and others. This work may be a starting point for bringing core NumPy-like functionality to the browser.
These projects could be combined with asm.js, SIMD.js or WebCL to achieve high performance.
I think there's a whole body of evidence showing that scientific Python will eventually come to the browser. How, I don't know, but I can definitely see several promising approaches.
There's some really exciting and challenging work down the road, and I can't wait to see what the community will bring to life.