Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and resulting high demand put on the memory subsystem. In this study, we present the implementation of a cone beam reconstruction algorithm on the Cell Broadband Engine (CBE) processor aimed at real-time applications. The cone-beam backprojection performance was assessed by backprojecting a half-circle scan of 512 projections of 1024<sup>2</sup> pixels into a volume of size 512<sup>3</sup> voxels. The projections are acquired on a C-Arm scanner and directed in real time to a CBE-based platform for real-time reconstruction. The acquisition speed typically ranges between 17 and 35 projections per second. On a CBE processor clocked at 3.2 GHz, our implementation performs this task in ~13 seconds, allowing for real time reconstruction.
Adaptive filtering is a compute-intensive algorithm aimed at effectively reducing noise without blurring the structures
contained in a set of digital images. In this study, we take a generalized approach for adaptive filtering based on seven
oriented filters, each individual filter implemented by a two-dimensional (2D) convolution with a mask size of 11 by 11
pixels. Digital radiology workflow imposes severe real-time constraints that require the use of hardware acceleration
such as provided by multicore processors. Implementing complex algorithms on heterogeneous multicore architectures is
a complex task especially for taking advantage of the DMA engines. We have implemented the algorithm on a Cell
Broadband Engine (CBE) processor clocked at 3.2 GHz using a generic framework for multicore processors. This
implementation is capable of filtering images of 512<sup>2</sup> pixels at a throughput of 40 frames per second while allowing
changing the parameters in real time. The resulting images are directed to the DR monitor or to the real-time computed
tomography (CT) reconstruction engine.
Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step.
Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms.
In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs.
The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
Tomographic image reconstruction, such as the reconstruction of CT projection values, of tomosynthesis data,
PET or SPECT events, is computational very demanding. In filtered backprojection as well as in iterative
reconstruction schemes, the most time-consuming steps are forward- and backprojection which are often limited
by the memory bandwidth.
Recently, a novel general purpose architecture optimized for distributed computing became available: the
Cell Broadband Engine (CBE). Its eight synergistic processing elements (SPEs) currently allow for a theoretical
performance of 192 GFlops (3 GHz, 8 units, 4 floats per vector, 2 instructions, multiply and add, per clock).
To maximize image reconstruction speed we modified our parallel-beam and perspective backprojection
algorithms which are highly optimized for standard PCs, and optimized the code for the CBE processor.<sup>1-3</sup> In
addition, we implemented an optimized perspective forwardprojection on the CBE which allows us to perform
statistical image reconstructions like the ordered subset convex (OSC) algorithm.<sup>4</sup>
Performance was measured using simulated data with 512 projections per rotation and 512<sup>2</sup> detector elements.
The data were backprojected into an image of 512<sup>3</sup> voxels using our PC-based approaches and the new CBE-
based algorithms. Both the PC and the CBE timings were scaled to a 3 GHz clock frequency.
On the CBE, we obtain total reconstruction times of 4.04 s for the parallel backprojection, 13.6 s for the
perspective backprojection and 192 s for a complete OSC reconstruction, consisting of one initial Feldkamp
reconstruction, followed by 4 OSC iterations.
Cone-beam reconstruction (CBR) is useful for producing volume images from projections in many fields including
medicine, biomedical research, baggage scanning, paleontology, and nondestructive manufacturing inspection. CBR
converts a set of two-dimensional (2-D) projections into a three-dimensional (3-D) image of the projected object. The
most common algorithm used for CBR is referred to as the Feldkamp-Davis-Kress (FDK) algorithm; this involves
filtering and cone-beam backprojection steps for each projection of the set. Over the past decade we have observed or
studied FDK on platforms based on many different processor types, both single-processor and parallel-multiprocessor
architectures. In this paper we review the different platforms, in terms of design considerations that include speed,
scalability, ease of programming, and cost. In the past few years, the availability of programmable special processors
(i.e. graphical processing units [GPUs] and Cell Broadband Engine [BE]), has resulted in platforms that meet all the
desirable considerations simultaneously.
Proc. SPIE. 6142, Medical Imaging 2006: Physics of Medical Imaging
KEYWORDS: Digital signal processing, Surface plasmons, Detection and tracking algorithms, Sensors, Mercury, Computing systems, Signal processing, Computed tomography, Reconstruction algorithms, Personal protective equipment
Over the last few decades, the medical imaging community has passionately debated over different approaches to implement reconstruction algorithms for Spiral CT. Numerous alternatives have been proposed. Whether they are approximate, exact or, iterative, those implementations generally include a backprojection step. Specialized compute platforms have been designed to perform this compute-intensive algorithm within a timeframe compatible with hospital-workflow requirements. Solving the performance problem in a cost-effective way had driven designers to use a combination of digital signal processor (DSP) chips, general-purpose processors, application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs). The Cell processor by IBM offers an interesting alternative for implementing the backprojection, especially since it offers a good level of parallelism and vast I/O capabilities. In this paper, we consider the implementation of a straight backprojection algorithm on the Cell processor to design a cost-effective system that matches the performance requirements of clinically deployed systems. The effects on performance of system parameters such as pitch and detector size are also analyzed to determine the ideal system size for modern CT scanners.