Reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. For
interventional image reconstruction, hardware optimization is mandatory. Manufacturers of medical equipment
use a variety of high-performance computing (HPC) platforms, like FPGAs, graphics cards, or multi-core CPUs.
A problem of this diversity is that many different frameworks and (vendor-specific) programming languages are
used. Furthermore, it is costly to switch the platform, since the code has to be re-written, verified, and optimized.
OpenCL, a relatively new industry standard for HPC, promises to enable portable code. Its key idea is to
abstract hardware in a way that allows an efficient mapping onto real CPUs, GPUs, and other hardware. The
code is compiled for the actual target by the device driver.
In this work we investigated the suitability of OpenCL as a tool to write portable code that runs efficiently
across different hardware. The problems chosen are back- and forward-projection, the most time-consuming
parts of (iterative) reconstruction. We present results on three platforms, a multi-core CPU system and two
GPUs, and compare them against manually optimized native implementations.
We found that OpenCL allows to share a common framework in one language across platforms. However,
considering differences in the underlying architecture, a hardware-oblivious implementation cannot be expected
to deliver maximal performance. By optimizing the OpenCL code for the specific hardware we reached over 90%
of native performance for both problems, back- and forward-projection, on all platforms.
The Common Unified Device Architecture (CUDA) introduced in 2007 by NVIDIA is a recent programming
model making use of the unified shader design of the most recent graphics processing units (GPUs). The
programming interface allows algorithm implementation using standard C language along with a few extensions
without any knowledge about graphics programming using OpenGL, DirectX, and shading languages.
We apply this novel technology to the Simultaneous Algebraic Reconstruction Technique (SART), which is
an advanced iterative image reconstruction method in cone-beam CT. So far, the computational complexity of
this algorithm has prohibited its use in most medical applications. However, since today's GPUs provide a high
level of parallelism and are highly cost-efficient processors, they are predestinated for performing the iterative
reconstruction according to medical requirements.
In this paper we present an efficient implementation of the most time-consuming parts of the iterative reconstruction
algorithm: forward- and back-projection. We also explain the required strategy to parallelize the
algorithm for the CUDA 1.1 and CUDA 2.0 architecture. Furthermore, our implementation introduces an acceleration
technique for the reconstruction compared to a standard SART implementation on the GPU using
CUDA. Thus, we present an implementation that can be used in a time-critical clinical environment.
Finally, we compare our results to the current applications on multi-core workstations, with respect to both
reconstruction speed and (dis-)advantages. Our implementation exhibits a speed-up of more than 64 compared
to a state-of-the-art CPU using hardware-accelerated texture interpolation.
A high-resolution (198 μm) C-arm CT imaging system (Axiom Artis dTA, Siemens Medical Solutions, Forchheim, Germany) was optimized for imaging superficial femoral artery (SFA) stents in humans. The SFA is susceptible to the development of atherosclerotic lesions. These are typically treated with angioplasty and stent deployment. However, these stents can have a fracture rate as high as 35%. Fracture is usually accompanied by restenosis and reocclusion. The exact cause of breakage is unknown and is hypothesized to result from deforming forces due to hip and knee flexion. Imaging was performed with the leg placed in both straight and bent positions. Projection images obtained during 20 s scans with ~200° of rotation of the C-arm were back-projected to obtain 3D volumes. Using a semi-automatic software algorithm developed in-house, the stent centerlines were found and ellipses were fitted to the slice normals. Image quality was adequate for calculations in 11/13 subjects. Bending the leg was found to shorten the stents in 10/11 cases with the maximum change being 9% (12 mm in a 133 mm stent), and extend the stent in one case by 1.6%. The maximum eccentricity change was 36% with a bend angle of 72° in a case where the stent extended behind the knee.