Reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. For
interventional image reconstruction, hardware optimization is mandatory. Manufacturers of medical equipment
use a variety of high-performance computing (HPC) platforms, like FPGAs, graphics cards, or multi-core CPUs.
A problem of this diversity is that many different frameworks and (vendor-specific) programming languages are
used. Furthermore, it is costly to switch the platform, since the code has to be re-written, verified, and optimized.
OpenCL, a relatively new industry standard for HPC, promises to enable portable code. Its key idea is to
abstract hardware in a way that allows an efficient mapping onto real CPUs, GPUs, and other hardware. The
code is compiled for the actual target by the device driver.
In this work we investigated the suitability of OpenCL as a tool to write portable code that runs efficiently
across different hardware. The problems chosen are back- and forward-projection, the most time-consuming
parts of (iterative) reconstruction. We present results on three platforms, a multi-core CPU system and two
GPUs, and compare them against manually optimized native implementations.
We found that OpenCL allows to share a common framework in one language across platforms. However,
considering differences in the underlying architecture, a hardware-oblivious implementation cannot be expected
to deliver maximal performance. By optimizing the OpenCL code for the specific hardware we reached over 90%
of native performance for both problems, back- and forward-projection, on all platforms.