CT scanner simulation virtually simulates the projection process of CT without actually scanning. It is very useful to
design, evaluate and develop CT systems which are evolving into some directions. However, in order to simulate
multiple detector rows, multiple x-ray energy and other dimensions simultaneously, it becomes time consuming because
of large amount of computation. In this paper, we present a solution to this problem with CUDA architecture on GPU.
Our solution contains three steps. First, CPU prepares the data that will be used by GPU. Then, GPU kernel is launched
to calculate the projection of all rays through the phantom data in parallel. In order to get maximum memory bandwidth,
we optimized the data storage by padding 2D arrays to ensure the global memory access coalesced. Finally, post
processing is done on CPU. Our experiment environment includes a dual core CPU and a NVIDIA Quadro FX 1800
GPU with CUDA compute capability 1.1. We used three kinds of phantom data to test the performance. It is found that
our solution gets the same image quality in double precision but gains a speed increase of more than 10 times faster than
using CPU only.