The Common Unified Device Architecture (CUDA) introduced in 2007 by NVIDIA is a recent programming
model making use of the unified shader design of the most recent graphics processing units (GPUs). The
programming interface allows algorithm implementation using standard C language along with a few extensions
without any knowledge about graphics programming using OpenGL, DirectX, and shading languages.
We apply this novel technology to the Simultaneous Algebraic Reconstruction Technique (SART), which is
an advanced iterative image reconstruction method in cone-beam CT. So far, the computational complexity of
this algorithm has prohibited its use in most medical applications. However, since today's GPUs provide a high
level of parallelism and are highly cost-efficient processors, they are predestinated for performing the iterative
reconstruction according to medical requirements.
In this paper we present an efficient implementation of the most time-consuming parts of the iterative reconstruction
algorithm: forward- and back-projection. We also explain the required strategy to parallelize the
algorithm for the CUDA 1.1 and CUDA 2.0 architecture. Furthermore, our implementation introduces an acceleration
technique for the reconstruction compared to a standard SART implementation on the GPU using
CUDA. Thus, we present an implementation that can be used in a time-critical clinical environment.
Finally, we compare our results to the current applications on multi-core workstations, with respect to both
reconstruction speed and (dis-)advantages. Our implementation exhibits a speed-up of more than 64 compared
to a state-of-the-art CPU using hardware-accelerated texture interpolation.
Proc. SPIE. 6510, Medical Imaging 2007: Physics of Medical Imaging
KEYWORDS: Digital signal processing, Surface plasmons, Surgery, Computer programming, Data acquisition, Computed tomography, Convolution, Reconstruction algorithms, Chemical elements, Personal protective equipment
In most of today's commercially available cone-beam CT scanners, the well known FDK method is used for solving
the 3D reconstruction task. The computational complexity of this algorithm prohibits its use for many medical
applications without hardware acceleration. The brand-new Cell Broadband Engine Architecture (CBEA) with
its high level of parallelism is a cost-efficient processor for performing the FDK reconstruction according to the
medical requirements. The programming scheme, however, is quite different to any standard personal computer
hardware. In this paper, we present an innovative implementation of the most time-consuming parts of the
FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the
algorithm for the CBEA. Our software framework allows to compute the filtering and back-projection in parallel,
making it possible to do an on-the-fly-reconstruction. The achieved results demonstrate that a complete FDK
reconstruction is computed with the CBEA in less than seven seconds for a standard clinical scenario. Given the
fact that scan times are usually much higher, we conclude that reconstruction is finished right after the end of
data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately
after the last projection image has been acquired by the scanning device.