Optical coherence tomography (OCT) is a noninvasive, noncontact imaging modality used to obtain cross sectional images of tissue structures with high resolution. Broadly, OCT has been classified into two categories: time-domain (TD)-OCT and Fourier-domain (FD)-OCT. Conventional TD-OCT can detect the echo time delays of light by measuring the interference signal as a function of time during a depth scan (A-scan) in a reference arm at each position of a lateral probe beam. In FD-OCT, instead of a mechanical A-scan, depth information can be retrieved by detecting the interference signal as a function of wavelength. In spectral-domain (SD) OCT, this is achieved with a broadband source and a spectrometer on the detector arm.^{1} Optical frequency domain imaging (OFDI) uses a swept source and point detector to acquire the same information.^{2} The main advantage of these schemes over TD-OCT is a marked increase in sensitivity and imaging speed.^{3} Spectrometers with a linear array detector and swept source normally operate at tens of kHz, which allows data acquisitions at video rates [
$30\phantom{\rule{0.3em}{0ex}}\text{frames}\u2215\mathrm{sec}$
, (fps)]. FD-OCT images require resampling of the axial data from wavelength
$\left(\lambda \right)$
space to wavenumber
$(k=2\pi \u2215\lambda )$
space, and the subsequent application of the inverse Fourier transform. Therefore, although data acquisitions of high-speed FD-OCT are achieved at a higher rate than that of video, the display rate of OCT images often occurs at a much slower rate than the acquisition, because this heavy signal processing must be performed by a computer.

One example of real-time display has been demonstrated by digital signal processing (DSP) hardware using a single field programmable gate array (FPGA) integrated circuit (IC) and a custom electronics board.^{4} The display frame rate for processed OCT images (1024 axial
$\text{pixels}\times 512$
lateral A-scans) was
$27\phantom{\rule{0.3em}{0ex}}\mathrm{fps}$
in the SD-OCT system. This type of equipment is expensive, and must be custom built for FPGA technology

To avoid calculating a numerical
$k$
-space resampling prior to Fourier transform, a linear-in-wave-number (linear-
$k$
) spectrometer and
$k$
-space linear swept source have been designed.^{5, 6, 7, 8} A linear-
$k$
spectrometer consisting of a grating and an optical glass prism in the
$1.3\text{-}\mu \mathrm{m}$
region not only saved computing time but also improved SNR falloff.^{5} The FD-OCT systems based on linear-
$k$
techniques still require high speed fast Fourier transform (FFT) processing to realize real-time display of OCT images.

Recently, one approach to accelerating numerical calculations has been to use a graphics processing unit (GPU) instead of a central processing unit (CPU). A GPU with many stream processors allows us to use highly parallel processors. The advantage of GPU computing is the implementation of high speed computation at a low cost, and simple programming on the host computer. In the field of optics, GPU techniques have been applied to reconstruct digital holograms.^{9, 10}

In this work, we demonstrated real-time display on a linear- $k$ SD OCT system using GPU programming. We estimated the optimal combination of a diffractive grating and a prism for the linear- $k$ spectrometer in the $840\text{-}\mathrm{nm}$ spectral region. The computing time using the GPU was $6.1\phantom{\rule{0.3em}{0ex}}\mathrm{ms}$ for data size of 2048 FFT $\text{size}\times 1000$ lateral A-scans, and was shorter than the frame interval time $\left(35.8\phantom{\rule{0.3em}{0ex}}\mathrm{ms}\right)$ using a line scan camera at $27.9\phantom{\rule{0.3em}{0ex}}\mathrm{kHz}$ . A display rate of $27.9\phantom{\rule{0.3em}{0ex}}\mathrm{fps}$ for processed images was achieved using a low cost GPU.

Figure 1 shows a schematic of our SD-OCT system. The output light of a superluminescent diode [SLD-370-HP, Superlum, (County Cork, Ireland), center wavelength ${\lambda}_{0}=840.8\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ , full-width at half-maximum spectral width $\Delta \lambda =48.7\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ ] was split into a sample and reference arm, with the latter terminated by a mirror. A probe at the end of the sample arm delivered light to a sample and received backscattered light from within the sample. Achromatic lenses $(f=100\phantom{\rule{0.3em}{0ex}}\mathrm{mm})$ were inserted in both arms. The predicted lateral resolution was $21.4\phantom{\rule{0.3em}{0ex}}\mu \mathrm{m}$ . The light returned from the two interferometer arms was recombined and directed to a linear- $k$ spectrometer consisting of a diffraction grating (Wasatch Photonics, Volume Phase Holographic Grating, $1200\phantom{\rule{0.3em}{0ex}}\text{lines}\u2215\mathrm{mm}$ ) and an optical glass prism. When the incident angle is the blaze angle, ${\theta}_{\mathit{in}}={\mathrm{sin}}^{-1}(m{\lambda}_{0}\u22152)$ , which gives the best diffraction efficiency, the first-order diffraction angle of light at a wavelength $\lambda $ is

## Eq. 1

${\theta}_{d}\left(\lambda \right)={\mathrm{sin}}^{-1}(m\lambda -\mathrm{sin}\phantom{\rule{0.2em}{0ex}}{\theta}_{\mathrm{in}})={\mathrm{sin}}^{-1}(m\lambda -m{\lambda}_{0}\u22152),$## Eq. 2

$\mathrm{sin}\phantom{\rule{0.2em}{0ex}}{\theta}_{\mathrm{out}}\left(\lambda \right)=n\left(\lambda \right)\mathrm{sin}\{\alpha -{\mathrm{sin}}^{-1}\left[\mathrm{sin}\phantom{\rule{0.2em}{0ex}}\frac{{\theta}_{d}\left(\lambda \right)+\beta}{n\left(\lambda \right)}\right]\}$^{11}which could be programmed in only a C language environment to implement the processing power of the GPUs. We developed software that included image acquisitions, GPU programming, and a graphical user interface environment in Microsoft Visual C++, 2008 Express Edition.

First we calculated the location of the spectral component at the focal plane in a wave number range between 7 and $8\phantom{\rule{0.3em}{0ex}}(\u2215\mu \mathrm{m})$ at each angle $\beta $ between the prism and the grating to optimize the linearity of the spectrometer. Here, the prism materials are BK7, F2, and SF10, with the angle $\alpha =60\phantom{\rule{0.3em}{0ex}}\mathrm{deg}$ . A comparison of the derivatives of the locations at the optimal angle with respect to wavenumber is shown in Fig. 2a . The optimal angle was estimated by the standard deviation of the derivatives, as shown in Fig. 2b. From these, the combination of the grating with $1200\phantom{\rule{0.3em}{0ex}}\text{lines}\u2215\mathrm{mm}$ and the F2 equilateral prism were suitable for the linear- $k$ spectrometer.

Next we estimated the computing time using a GPU, which operated in
$32\text{-bit}$
floating-point (single precision) mode. Figure 3 shows the flowchart of real-time OCT imaging. Initially, the reference intensity is measured for DC removal, and then stores in the memory on the GPU. In Fig. 3, the dash block shows the routine procedure in our system. First, a spectral interference image (2048 axial
$\text{pixels}\times 1000$
lateral pixels,
$16\text{-bit}$
resolution) is captured on the host computer and then is transferred to the GPU memory. Second, the type of data is converted from a
$16\text{-bit}$
integer to a
$32\text{-bit}$
floating point, and then the DC removal process is performed using the stored reference intensity. Here the real and imaginary parts of complex data are set to the processed data and zero, respectively. Third, the 2048-point FFT is performed for 1000 A-scans and then a log scaling process is performed to obtain an OCT image. We performed FFT processing using Nvidia’s CUDA FFT library (CUFFT).^{11} Finally, after converting the type of data from a
$32\text{-bit}$
floating point to a
$16\text{-bit}$
integer, the calculated result is transferred to the memory on the host computer and then is displayed on the monitor. The estimated computing time between the data transfer to the GPU memory and the data transfer from the GPU memory was
$6.1\phantom{\rule{0.3em}{0ex}}\mathrm{ms}$
and was shorter than the frame interval time
$\left(35.8\phantom{\rule{0.3em}{0ex}}\mathrm{ms}\right)$
using a line scan camera at
$27.9\phantom{\rule{0.3em}{0ex}}\mathrm{kHz}$
. Consequently, the real-time display of the processed OCT images could be achieved using the GPU in our system.

High performance computing can be achieved by Intel’s Math Kernel Library (MKL), which is a library of highly optimized, thread-safe, mathematical functions for an Intel CPU. Govindaraju have compared their novel algorithms of discrete Fourier transforms to CUFFT for the GPU and MKL on the CPU.^{12} Their algorithm was two times faster than the CUFFT on the GPU (Nvidia, GTX280) and 12 times faster than the MKL on the CPU (Intel QX9650
$3.0\text{-}\mathrm{GHz}$
quad-core processor and
$4\text{-}\mathrm{GB}$
memory) for computing the data size of 2048 FFT
$\mathrm{size}\times 4096$
number of FFTs. From this comparison, we can understand that the CUFFT on the GPU was about six times faster than the MKL on the CPU.

Finally, we measured the OCT images of a human finger pad *in vivo*. We used a commercial available F2 prism (Thorlabs Incorporated, Newton, New Jersey) for the linear-
$k$
spectrometer. The spectrometer settings provided a spectral resolution of
$0.049\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$
and a depth range of
$3.6\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
. With a probing power of
$5.0\phantom{\rule{0.3em}{0ex}}\mathrm{mW}$
and an integration time of
$34\phantom{\rule{0.3em}{0ex}}\mu \mathrm{s}$
, a sensitivity of
$99\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$
was measured close to the zero delay and dropped to
$94\phantom{\rule{0.3em}{0ex}}\mathrm{dB}$
at
$1\phantom{\rule{0.3em}{0ex}}\mathrm{mm}$
depth. The probe beam was scanned at
$27.9\phantom{\rule{0.3em}{0ex}}\mathrm{Hz}$
using the sawtooth waveform with a duty cycle of 90%, which was modified to reduce mechanical vibrations. Figure 4 and
Videos 1 show the OCT images with an imaging range of a
$4.0\times 3.6\phantom{\rule{0.3em}{0ex}}{\mathrm{mm}}^{2}$
$(\text{lateral}\times \text{axial})$
. We could obtain the OCT images without the resampling process, because the spectral linearity in the wave number was improved by a linear-
$k$
spectrometer rather than a conventional spectrometer. Since it is not perfect linearity to achieve high precision OCT imaging, our OCT system needs to perform a resampling process using calibrated spectral data. The computing time on the GPU had a wide margin for the FFT process only in our SD-OCT system. Therefore, GPU programming has potential for implementing necessary signal processing such as resampling, spectral shaping of non-Gaussian spectral data, and dispersion compensation. The performance of a real-time display is very important for clinical applications that need immediate diagnosis for screening or biopsy/surgery. The GPU is an attractive tool for clinical and commercial systems because of its high performance computing and low cost.

In conclusion, we demonstrate a real-time display on the linear- $k$ SD OCT system using GPU programming. We use the linear- $k$ spectrometer combined with a diffractive grating $(1200\phantom{\rule{0.3em}{0ex}}\text{lines}\u2215\mathrm{mm})$ and a F2 equilateral prism at $840\phantom{\rule{0.3em}{0ex}}\mathrm{nm}$ to avoid resampling of the axial data from wavelength to wave number. The calculation of the FFT is accelerated by the GPU. The computing time is $6.1\phantom{\rule{0.3em}{0ex}}\mathrm{ms}$ for data of size $2048\phantom{\rule{0.3em}{0ex}}\text{pixels}\times 1000$ lateral A-scans, and is shorter than the frame interval time of the interference frame. Our system can display processed OCT images in real time at $27.9\phantom{\rule{0.3em}{0ex}}\mathrm{fps}$ . Since the GPUs are cost effective for real-time display of FD-OCT images, the potential applications for this technique are wide.

## Acknowledgments

This study was partially supported by Grant-in-Aid for Scientific Research (20700375) in the Japan Society for the Promotion of Science (JSPS) and Industrial Technology Research Grant Program in 2005 from New Energy and Industrial Technology Development Organization (NEDO) of Japan.