Hardware-accelerated interactive data visualization in Python

There have been several interesting discussions recently about the future of visualization in Python. Jake Vanderplas wrote a detailled post about the current state-of-the-art of visualization software in Python. Michael Droettboom, one of the Matplotlib's core developers, consequently wrote about the future challenges Matplotlib will need to tackle. Matplotlib has been designed more than ten years ago, and now needs to embrace modern trends including web frontends, high-level R-style plotting interfaces, hardware-accelerated visualization, etc.

One trend that I find particularly interesting concerns hardware acceleration. Standard graphics cards are now extremely powerful and are routinely used in 2D or 3D video games, web browsers, user interfaces. Mobile devices such as smartphones, tablets, and even the Raspberry Pi include a decent graphical processing unit (GPU). There is no reason why hardware-acceleration should be absent in scientific visualization, where there is a really pressing need to display huge data sets. Graphics cards are now the most powerful computing units in a computer, and they are specifically adapted to fast visualization.

Hardware-accelerated visualization is not always relevant. It is slightly less portable than a pure software implementation. It requires the graphics card drivers to be up-to-date, which is not always the case on non-technical users' computers. And it is not always necessary, as plots with a reasonable amount of data can still be smoothly displayed without using the graphics card at all. In fact, a clear distinction needs to be made between plotting and interactive visualization. In the former, the emphasis is on the generation of high-quality static plots ready for publication, that represent some short and precise summary of the data. In the latter, the objective is to let the user take a look to the raw data, find out what the interesting statistics can be and if there are unexpected patterns. This step comes before the generation of publication-ready plots, and would much more benefit from hardware acceleration than for common plotting. Existing plotting libraries typically do not make the distinction between these two use cases.

Software

Hardware-accelerated visualization software is still a work in progress. As far as Python is concerned, libraries include: Galry, which I started a few months ago, Glumpy, Visvis, PyQtGraph which has support for hardware-accelerated 3D visualization. All these libraries use OpenGL for hardware acceleration. Let's also mention Mayavi which is specifically designed for 3D rendering.

I started Galry because I needed to plot very large data sets containing more than ten million points, and no existing library could support such large data. One may wonder why someone would need to plot more than ten million points whereas screens have merely a few million pixels. The answer is simple: to visualize large datasets interactively. For instance, displaying an outline of the whole dataset, before zooming in on the regions of interest. Huge datasets are not rare today: a one-hour long signal sampled at 20 kHz represents already 72 million points (example of an intracellular recording). This gets worse with multi-channel data, like in multi-electrode extracellular recordings. There can be tens and even hundreds of signals in parallel (for example with recordings in the retina). These are typical cases where the graphics card can really be helpful for visualization.

Example: multi-channel recordings viewer

In fact I recently wrote a small prototype with Galry for quick visualization of multi-channel recordings. The data is assumed to be stored in an HDF5 file, which can weigh several gigabytes. Such a large file can merely reside in system memory, even less in graphics memory, so it needs to be dynamically fetched as one navigates within the plot. Reading data from the disk occurs in an external thread so that the interface is not frozen. When zooming out, the signals are automatically and dynamically undersampled. These techniques enable fast and smooth interactive visualization of huge datasets.

Web-based interactive visualization

Another interesting direction concerns web browser frontends. We spend more and more time in the browser in general. The success of the IPython notebook shows that interactive computing can seamlessly happen in the browser. There are excellent web-based visualization libraries out there, like D3 or threeJS. The WebGL standard, although young, brings native, plugin-free hardware-acceleration graphics in the browser via OpenGL ES 2.0. It is supported in most recent browsers. Even Internet Explorer might support it in the future. So the technology exists for bringing high performance interactive data visualization to the browser. There is definitely some work to do here.

In Python, web-based interactive visualization could be done in two ways. In the first, VNC-like approach, user actions are captured by Javascript, transferred to Python which generates a static visualization and returns it back to the browser as a compressed image. In the second approach, the actual visualization happens entirely in the browser. The first approach is probably more generic and simpler to implement, whereas the latter could be useful for static notebooks or webpages. In a data sharing perspective, having a pure HTML/Javascript interactive visualization that does not require a Python backend engine would be quite relevant. Both approaches are complementary.

The future

In summary, the technology for hardware-accelerated interactive visualization in Python and in the browser is there. Different Python libraries have been developed independently, have different code bases, offer different programming interfaces and are not mutually compatible. For these reasons, the developers of these libraries are currently working on a new project that has the goal to allow more sharing of code between these libraries, and to provide interfaces to OpenGL at different levels of abstraction and generality. The long-term goal for this project is to offer a single library for high-performance visualization, be it 2D or 3D, for scientific purposes or not, etc. But for now, we're focusing on building high-quality modules operating at different levels of abstraction, from low-level OpenGL to high-level plotting and interaction commands. We'll start with the low-level blocks, so that we can rewrite our different libraries on top of them.

That's an ambitious and difficult project that will take time, but the core technology and code is already there. The hard part is to create a coherent, flexible and high-quality architecture for supporting different levels of abstraction in a seamless and efficient way. I'm confident that we'll come up with a solid piece of software at some point. And of course, if you're interested, you're welcome to contribute.