A tutorial on OpenGL/OpenCL interoperability in Python
In the last two posts, I've shown how to use OpenCL for GPGPU, and OpenGL for graphics rendering, with Python. Here I'll show how both OpenCL and OpenGL can be used at the same time with Python. It's called OpenCL-OpenGL interoperability. What is it about?
- OpenGL gives low-level access to the graphics card to do real-time graphics rendering with hardware acceleration in an hardware-independent way.
- OpenCL gives low-level access to the graphics card to do general-purpose intensive vectorized computations with hardware acceleration in an hardware-independent way.
- OpenCL/OpenGL interoperability allows to do intensive arbitrary computations for real-time graphics rendering.
Broadly speaking, the GPU is natively designed to perform highly-efficient vectorized linear computations (matrix transforms, rasterization...), for up to four dimensions (3D + 1D for homogeneous coordinates). 2D and 3D graphics rendering was indeed the only use for a GPU originally. GPGPU languages such as OpenCL give the developer access to a C-like language for doing hardware-accelerated high-dimensional or non-linear computations on the GPU, generally for non-graphical purposes. However, such general computations may actually be necessary for some specific graphical purposes: physical simulations, mathematical rendering, etc. In those cases, the most efficient way of using OpenCL for graphical data is to let the data on the GPU at all times. Data transfers between the CPU and the GPU are known to be slow and form a major bottleneck in those situations.
With OpenCL-OpenGL interoperability, one can execute OpenCL code on the same data buffers than those used by OpenGL for graphics rendering. The GPU is responsible for both OpenCL computations and rendering, and the data stays in GPU memory at all times.
If one does not have an OpenCL-compatible graphics card, then the code still works as expected. However, data transfers between the CPU and the GPU are not avoidable since the OpenCL kernel actually executes on the CPU in this case. This happens automatically and transparently. Hence, as for standard OpenCL code, a program making use of OpenCL-OpenGL interoperability can still work on a computer that does not include a compatible GPU. This is a very interesting point regarding software portability.
I will now describe a simple script illustrating OpenCL-OpenGL interoperability. This script initializes an empty OpenGL VBO and an OpenCL buffer containing points on an horizontal line. An OpenCL kernel then copies the points from the OpenCL buffer to the OpenGL one and transform the \(y\)-coordinate according to a sine function. This function is then displayed on the screen.
Installation
This script requires Numpy, PyOpenGL, PyOpenCL, and an OpenCL SDK with OpenGL interoperability support. Also, on some platforms (like Windows 8 apparently), the Python process needs to be run as an administrator so that this script can work.
OpenCL initialization
OpenCL needs to be initialized with OpenGL interoperability. This code snippet does just that:
def clinit():
"""Initialize OpenCL with GL-CL interop.
"""
plats = cl.get_platforms()
# handling OSX
if sys.platform == "darwin":
ctx = cl.Context(properties=get_gl_sharing_context_properties(),
devices=[])
else:
ctx = cl.Context(properties=[
(cl.context_properties.PLATFORM, plats[0])]
+ get_gl_sharing_context_properties())
queue = cl.CommandQueue(ctx)
return ctx, queue
This code comes from this blog post.
This function returns an OpenCL context object, and a command queue used for compiling and executing kernels, and for initializing OpenCL buffers.
Buffers initialization
The trickiest part concerns the initialization of the buffers. First, such
initialization needs to occur after OpenGL initialization, but before
any OpenGL rendering. Placing the following code in the initializeGL()
function does the trick.
Next, we need here three buffer objects.
-
A standard OpenGL VBO: we tell OpenGL that we may write to this buffer several times (since the OpenCL kernel has access to it) with the
usage=GL_DYNAMIC_DRAW
keyword argument. This buffer initially contains only zeros.# empty OpenGL VBO self.glbuf = glvbo.VBO(data=np.zeros(self.data.shape), usage=gl.GL_DYNAMIC_DRAW, target=gl.GL_ARRAY_BUFFER) self.glbuf.bind()
-
A standard OpenCL buffer: in our example, this buffer contains the "source" data, a N*2 Numpy array containing points of coordinates \((x,0)\) with \(x \in [-1,1]\). It is read-only since we just need access to this buffer to copy data from it to the OpenGL VBO. Also, we initialize OpenCL right after the OpenGL VBO creation, and just before the OpenCL buffer creation.
# initialize the CL context self.ctx, self.queue = clinit() # create a pure read-only OpenCL buffer self.clbuf = cl.Buffer(self.ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=self.data)
-
An interop object to access an OpenGL VBO from OpenCL: this object is passed to the OpenCL kernel and allows direct access to the OpenGL VBO.
# create an interop object to access to GL VBO from OpenCL self.glclbuf = cl.GLBuffer(self.ctx, cl.mem_flags.READ_WRITE, int(self.glbuf.buffers[0]))
Finally, once these buffers have been created, we can compile the OpenCL kernel.
# build the OpenCL program
self.program = cl.Program(self.ctx, clkernel).build()
# release the PyOpenCL queue
self.queue.finish()
Kernel code
The OpenCL kernel accepts two arguments: pointers to the OpenCL buffer (with source data), and to the OpenGL VBO. We first get the array index in the current thread, then we copy the data from the OpenCL buffer to the OpenGL VBO, and transform the y-coordinate through a sine function.
# OpenCL kernel that generates a sine function.
clkernel = """
__kernel void clkernel(__global float2* clpos, __global float2* glpos)
{
//get our index in the array
unsigned int i = get_global_id(0);
// copy the x coordinate from the CL buffer to the GL buffer
glpos[i].x = clpos[i].x;
// calculate the y coordinate and copy it on the GL buffer
glpos[i].y = 0.5 * sin(10.0 * clpos[i].x);
}
"""
Kernel execution
The second trickiest part is the kernel execution. Indeed, OpenCL needs to get
a secure access to the OpenGL VBO in order to avoid problems of concurrency
(since OpenGL also needs to access to this resource). Secure access is obtained
and released with the functions enqueue_acquire_gl_objects()
and
enqueue_release_gl_objects()
. The parameters are the queue returned by
our function clinit()
, and a list of interop objects to access (here, just
the glclbuf
object).
Then, the actual kernel execution is in PyOpenCL similar to PyCUDA: the
program
object has a method with the same name as the kernel name. The
parameters include the OpenCL queue, the global and local worksizes, and
the arguments to the kernel.
def execute(self):
"""Execute the OpenCL kernel.
"""
# get secure access to GL-CL interop objects
cl.enqueue_acquire_gl_objects(self.queue, [self.glclbuf])
# arguments to the OpenCL kernel
kernelargs = (self.clbuf,
self.glclbuf)
# execute the kernel
self.program.clkernel(self.queue, (self.count,), None, *kernelargs)
# release access to the GL-CL interop objects
cl.enqueue_release_gl_objects(self.queue, [self.glclbuf])
self.queue.finish()
Here, we call this function in initializeGL()
, after the buffers creation.
This way, we update the OpenGL VBO through the OpenCL kernel only at
initialization time.
But it would be more useful in a real application to execute the kernel
in the paintGL()
method.
OpenGL rendering
The paintGL()
function is very similar to the previous OpenGL tutorial, and
has nothing to do with OpenCL. We activate the VBO and tell OpenGL to draw
consecutive segments of lines (GL_LINE_STRIP
).
Note: apparently, in OpenGL, using single precision floating point numbers is better than using double precision float point numbers. The graphics card may not indeed support the latter format. I used doubles in an early version of this post and I had some nasty memory access violation crashes in particular cases. They disappeared when I switched to floats. If this is helpful to anyone...
def paintGL(self):
"""Paint the scene.
"""
# clear the GL scene
gl.glClear(gl.GL_COLOR_BUFFER_BIT)
# set yellow color for subsequent drawing rendering calls
gl.glColor(1,1,0)
# bind the VBO
self.glbuf.bind()
# tell OpenGL that the VBO contains an array of vertices
gl.glEnableClientState(gl.GL_VERTEX_ARRAY)
# these vertices contain 2 simple precision coordinates
gl.glVertexPointer(2, gl.GL_FLOAT, 0, self.glbuf)
# draw "count" points from the VBO
gl.glDrawArrays(gl.GL_LINE_STRIP, 0, self.count)
Full script
Here is the full script.
# PyQt4 imports
from PyQt4 import QtGui, QtCore, QtOpenGL
from PyQt4.QtOpenGL import QGLWidget
# PyOpenGL imports
import OpenGL.GL as gl
import OpenGL.arrays.vbo as glvbo
# PyOpenCL imports
import pyopencl as cl
from pyopencl.tools import get_gl_sharing_context_properties
# OpenCL kernel that generates a sine function.
clkernel = """
__kernel void clkernel(__global float2* clpos, __global float2* glpos)
{
//get our index in the array
unsigned int i = get_global_id(0);
// copy the x coordinate from the CL buffer to the GL buffer
glpos[i].x = clpos[i].x;
// calculate the y coordinate and copy it on the GL buffer
glpos[i].y = 0.5 * sin(10.0 * clpos[i].x);
}
"""
def clinit():
"""Initialize OpenCL with GL-CL interop.
"""
plats = cl.get_platforms()
# handling OSX
if sys.platform == "darwin":
ctx = cl.Context(properties=get_gl_sharing_context_properties(),
devices=[])
else:
ctx = cl.Context(properties=[
(cl.context_properties.PLATFORM, plats[0])]
+ get_gl_sharing_context_properties())
queue = cl.CommandQueue(ctx)
return ctx, queue
class GLPlotWidget(QGLWidget):
# default window size
width, height = 600, 600
def set_data(self, data):
"""Load 2D data as a Nx2 Numpy array.
"""
self.data = data
self.count = data.shape[0]
def initialize_buffers(self):
"""Initialize OpenGL and OpenCL buffers and interop objects,
and compile the OpenCL kernel.
"""
# empty OpenGL VBO
self.glbuf = glvbo.VBO(data=np.zeros(self.data.shape),
usage=gl.GL_DYNAMIC_DRAW,
target=gl.GL_ARRAY_BUFFER)
self.glbuf.bind()
# initialize the CL context
self.ctx, self.queue = clinit()
# create a pure read-only OpenCL buffer
self.clbuf = cl.Buffer(self.ctx,
cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
hostbuf=self.data)
# create an interop object to access to GL VBO from OpenCL
self.glclbuf = cl.GLBuffer(self.ctx, cl.mem_flags.READ_WRITE,
int(self.glbuf.buffers[0]))
# build the OpenCL program
self.program = cl.Program(self.ctx, clkernel).build()
# release the PyOpenCL queue
self.queue.finish()
def execute(self):
"""Execute the OpenCL kernel.
"""
# get secure access to GL-CL interop objects
cl.enqueue_acquire_gl_objects(self.queue, [self.glclbuf])
# arguments to the OpenCL kernel
kernelargs = (self.clbuf,
self.glclbuf)
# execute the kernel
self.program.clkernel(self.queue, (self.count,), None, *kernelargs)
# release access to the GL-CL interop objects
cl.enqueue_release_gl_objects(self.queue, [self.glclbuf])
self.queue.finish()
def update_buffer(self):
"""Update the GL buffer from the CL buffer
"""
# execute the kernel before rendering
self.execute()
gl.glFlush()
def initializeGL(self):
"""Initialize OpenGL, VBOs, upload data on the GPU, etc.
"""
# initialize OpenCL first
self.initialize_buffers()
# set background color
gl.glClearColor(0,0,0,0)
# update the GL buffer from the CL buffer
self.update_buffer()
def paintGL(self):
"""Paint the scene.
"""
# clear the GL scene
gl.glClear(gl.GL_COLOR_BUFFER_BIT)
# set yellow color for subsequent drawing rendering calls
gl.glColor(1,1,0)
# bind the VBO
self.glbuf.bind()
# tell OpenGL that the VBO contains an array of vertices
gl.glEnableClientState(gl.GL_VERTEX_ARRAY)
# these vertices contain 2 simple precision coordinates
gl.glVertexPointer(2, gl.GL_FLOAT, 0, self.glbuf)
# draw "count" points from the VBO
gl.glDrawArrays(gl.GL_LINE_STRIP, 0, self.count)
def resizeGL(self, width, height):
"""Called upon window resizing: reinitialize the viewport.
"""
# update the window size
self.width, self.height = width, height
# paint within the whole window
gl.glViewport(0, 0, width, height)
# set orthographic projection (2D only)
gl.glMatrixMode(gl.GL_PROJECTION)
gl.glLoadIdentity()
# the window corner OpenGL coordinates are (-+1, -+1)
gl.glOrtho(-1, 1, 1, -1, -1, 1)
if __name__ == '__main__':
import sys
import numpy as np
# define a Qt window with an OpenGL widget inside it
class TestWindow(QtGui.QMainWindow):
def __init__(self):
super(TestWindow, self).__init__()
# generate random data points
self.data = np.zeros((10000,2))
self.data[:,0] = np.linspace(-1.,1.,len(self.data))
self.data = np.array(self.data, dtype=np.float32)
# initialize the GL widget
self.widget = GLPlotWidget()
self.widget.set_data(self.data)
# put the window at the screen position (100, 100)
self.setGeometry(100, 100, self.widget.width, self.widget.height)
self.setCentralWidget(self.widget)
self.show()
# create the Qt App and window
app = QtGui.QApplication(sys.argv)
window = TestWindow()
window.show()
app.exec_()
Final notes
Here are some interesting related links.