Fourier domain optical coherence tomography (FD-OCT), with either spectrometer-based systems (spectral domain OCT, SDOCT) or frequency-swept laser-based systems (swept source OCT, SSOCT), has enabled OCT imaging with high sensitivities.^{1}^{,}^{2} OCT images of biological tissues can be obtained with a sufficient signal-to-noise ratio at an increased imaging speed. High-speed imaging is particularly important for *in vivo* imaging applications as it can minimize motion artifacts. Linear array detectors in SDOCT and swept sources normally operate at tens of kHz, allowing video rate data acquisition ($>30$ frames per second, fps). FD-OCT images require a fixed-pattern noise removal and a resampling process of the axial data from the wavelength ($\lambda $)-space to wave number ($k=2\pi /\lambda $)-space, and the subsequent application of the inverse Fourier transform. Therefore, FD-OCT requires high-speed signal processing techniques to obtain real-time images required in clinical applications, such as endoscopy and ophthalmology.

To display the processed OCT images in real-time, we demonstrated the FD-OCT system using a linear-in-wave number spectrometer to reduce the re-sampling process and a graphics processing unit (GPU).^{3} The GPU has stream processors that provide highly parallel computing at a low cost, plus the advantage of allowing simple programming on the host computer. The fast Fourier transform (FFT) calculations were accelerated by the GPU for 2048-point $\mathrm{FFT}\times 1000$ lateral A-line data at 27.9 fps. Subsequently, the real-time re-sampling processes in standard FD-OCT were demonstrated using GPU programming.^{4}^{,}^{5} The simplest re-sampling procedure is to perform linear interpolation, however, this leads to errors at greater depths due to the increased background or shoulders around the peaks. However, these were addressed with a zero-filling technique that involves a forward Fourier transform, zero padding to increase the data array length four-fold, and an inverse Fourier transform back to the spectral data. This spectral data, which was four times larger than the original array, was resampled in the $k$-space by linear interpolation and FFT was performed to obtain depth information. By accelerating the zero-filling technique using a GPU, our system achieved a real-time processed-image display rate of 27.23 fps for $2048\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{axial points}\times 1024\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{lateral}$ A-line data.^{6} Recently, a non-uniform fast Fourier transform was demonstrated to offer better performance than conventional interpolation methods^{7} and was accelerated using GPU programming.^{8}

The fixed-pattern noise associated with constant modulations of the spectra can be removed by generating a reference spectrum, either by recording the reference arm spectrum or by averaging many spectra of an actual measurement. Although a mean spectrum can be obtained without additional measurements, it causes a statistical error which appears as residual lateral lines in the OCT image, by the small number of highly reflective points because of a high refractive index mismatch between the tissue and air. S. Moon et al. have proposed a signal-processing method of median-line subtraction to avoid the drawback of the conventional mean-spectrum subtraction method.^{9} A median is defined as the numerical value separating the greater half of the sorted data from the lower half. The sorting process is known to consume a large amount of computational power and time.

In this study, the GPU-accelerated processing for FD-OCT with the fixed-pattern noise removal by using partial median subtraction was demonstrated. Based on the fact that artifacts having high reflectivity appear only on the surface of a sample, this surface was determined from the standard deviation of Fourier-transformed data in the lateral direction, after which the median calculation was performed only at the observed surface. The medians and means were subtracted from the Fourier-transformed data at the observed surface regions and other regions such as backgrounds, respectively. When the number of median calculations was less than 256 points for the OCT image having $512\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{depth points}\times 1024$ A-lines observed by the line scan camera operated at 46.9 kHz, the processed OCT images was displayed in real time.

Figure 1 shows a schematic of the SD-OCT system. A super-luminescent diode (SLD) with a central wavelength of 1330 nm and spectral bandwidth of 58 nm was used to illuminate the system. A polarization-independent fiber isolator was placed immediately after the light source to avoid light reflection into the light source. The light was then split into the sample and reference arms, with the latter being terminated by a mirror using a $90/10$ fiber coupler. Achromatic lenses ($f=45\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{mm}$) were inserted into both arms to compensate for the dispersion mismatch. Light returning from the reference and sample arms was recombined in the fiber coupler and the output interference spectra was detected by a custom-built spectrometer comprising a 35-mm focal length achromatic collimating lens, a 1145-lines/mm transmission grating (Wasatch Photonics), and a 100-mm focal length achromatic lens. The spectra were focused onto a line-scan InGaAs camera (Goodrich-Sensors Unlimited, 1024 pixels, 14-bit resolution) operated at 46.9 kHz. The camera output was fed into a personal computer (PC) via a camera-link board (National Instruments, PCI-1426, 16-bit resolution). The signal processing was performed using a graphics card (Nvidia GeForce GTX 580, 1544 MHz processor clock, 2004 MHz memory clock, 512 stream processors, and 1536 Mbytes GDDR5 memory). The customized GPU program was written in Nvidia’s dedicated software environment, the compute unified device architecture _(CUDA),^{10} which was compiled using Microsoft Visual C++ 2008 Express Edition.

Figure 2 shows the flow chart for the GPU programming process. On the host PC, the captured spectral interference image ($\mathrm{Nz}=1024$: $\text{spectral size}\times \mathrm{Nx}$: lateral points, 16-bit) is transferred to the memory on the GPU. The data transfer rate between the host PC and the GPU device is increased by using GPU paged memory. On the GPU, the data type of a spectral interference signal is first converted from a 16-bit integer to a 32-bit complex floating-point and the spectrum is re-sampled in the $k$-space. This resampling used a simple linear interpolation as it is executed three times faster compared to zero-filling interpolation.^{6} Here, the constant values of linear interpolation are only transferred to the GPU memory once before image acquisition. The real and imaginary parts of the complex data are set to the processed data and zero, respectively. Next, the $\mathrm{Nz}$-point inverse FFT is executed for $\mathrm{Nx}$ lateral points.

In actual measurements, the artifacts appeared in the lateral direction, not in the background, at tilted surfaces of a sample due to a large difference of the reflected intensities. Therefore, the median calculation is sufficient for performing partial depth positions. After the FFT process, the mean and standard deviation values were calculated for both real and imaginary parts at each depth. If the standard deviation is smaller than a given threshold value, the mean value is set to the DC component. If not, the median values are obtained from the real and imaginary parts of Fourier-transformed data by bitonic sort, which is a sorting algorithm designed especially for parallel machines. The alues are then set to the DC components. The bitonic sort used the sample code included in the Nvidia CUDA software development kit.^{10} The DC components that comprised the median and mean values were subtracted from the Fourier- transformed data and then an OCT image with $\mathrm{Nz}/2\times \mathrm{Nx}$ pixels using the log-scaling process was obtained. To save data-transfer time, the data were then converted from 32-bit floating point numbers to 8-bit gray scale. The results were transferred to the host computer memory and displayed on the monitor.

Figure 3 shows the GPU processing line rates of the FD-OCT with the partial median subtraction for a given number of median calculations. This rate was calculated using the number of A-lines divided by the processing time, which included the data transfers to and from the GPU memory. Since the processing time of the median in lateral direction is not a linear relationship for the number of A-lines $\mathrm{Nx}$, the data size used 512, 1024 and 2048 A-lines of 1024-spectrum, respectively. When the median calculations of both real and imaginary parts result in fewer than 256 depth points for 1024 A-lines, the processed OCT images can be obtained in real-time because the processing rate exceeds the line rate of the camera used. The median calculation time of 256 points took 87.7% of total processing time for the OCT image having $512\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{depth points}\times 1024$ A-lines.

In order to confirm the elimination of artifacts by partially subtracting median values, the tilted sample was measured to generate the artifact. Figure 4(a) and 4(b) show the OCT images ($512\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{axial}\times 1024\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{later pixels}$) of a human finger pad obtained by the mean-spectrum subtraction and partial median subtraction, respectively. Here, the imaging area was $3.4\times 3.1\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{mm}}^{2}$ ($\text{axial}\times \text{lateral}$). In Fig. 4(a), artifacts resulting from strong reflection appeared as residual lateral lines around the surface. Figure 4(c) shows the standard deviations of the real part and imaginary part of the Fourier transformed data. The 60 depth points were larger than the threshold, which was set to 10 times the standard deviation of the backgrounds. The median calculations were performed at the observed 60 points and the values obtained were smaller than those obtained under conditions of real-time processing (i.e., 256 points) in Fig. 3. By subtracting the DC components that consist of medians at 60 points and means at the other 452 points, the artifact-free OCT image was obtained, as shown in Fig. 4(b). Here the noise levels of backgrounds decreased about 14 dB. Therefore, the artifacts of the OCT images can be eliminated in real-time by partial medians. Since the median calculation took up a lot of computing time (even in GPU programming), the real-time processing is difficult if the highly reflective tilted surface regions exist in the entire depth range. When a sample has to be carefully located without tilting the surface, the reflected lights are almost constant in the lateral direction and then the artifacts will not appear in the OCT images processed by the conventional mean-spectrum subtraction. If the sample position cannot be adjusted in diagnostic applications, artifacts will interfere with the diagnosis. Hence, the proposed GPU processing for partial median subtraction effectively performs real-time artifact-free OCT imaging in diagnosis applications.

In conclusion, we demonstrated GPU-accelerated real-time FD-OCT with a fixed-pattern noise removal by subtracting mean and median. The highly reflective surface regions were obtained by calculating the lateral standard deviation of Fourier- transformed data, after which the medians and means were subtracted from the Fourier transformed data at the observed surface and another region. When the number of median calculations was less than 256 points for the OCT image having $512\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{axial points}\times 1024$ A-lines, which was observed by a line scan camera operating at 46.9 kHz, the processed OCT images could be displayed in real-time.

## Acknowledgments

This study was partially supported by Grant-in-Aid for Scientific Research (20700375) from the Japan Society for the Promotion of Science (JSPS).

## References

J. F. de Boeret al., “Improved signal-to-noise ratio in spectral-domain compared with time-domain optical coherence tomography,” Opt. Lett. 28(21), 2067–2069 (2003).OPLEDP0146-9592http://dx.doi.org/10.1364/OL.28.002067Google Scholar

R. LeitgebC. K. HitzenbergerA. F. Fercher, “Performance of Fourier domain vs. time domain optical coherence tomography,” Opt. Express 11(8), 889–894 (2003).OPEXFF1094-4087http://dx.doi.org/10.1364/OE.11.000889Google Scholar

K. Wanget al., “Development of a non-uniform discrete Fourier transform based high speed spectral domain optical coherence tomography system,” Opt. Express 17(14), 12121–12131 (2009).OPEXFF1094-4087http://dx.doi.org/10.1364/OE.17.012121Google Scholar

K. ZhangJ. U. Kang, “Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahigh-speed, real-time Fourier-domain OCT,” Opt. Express 18(22), 23472–23487 (2010).OPEXFF1094-4087http://dx.doi.org/10.1364/OE.18.023472Google Scholar

Y. WatanabeT. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14(6), 060506–060508 (2009).JBOPFO1083-3668http://dx.doi.org/10.1117/1.3275463Google Scholar

K. ZhangJ. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18(11), 11772–11784 (2010).OPEXFF1094-4087http://dx.doi.org/10.1364/OE.18.011772Google Scholar

S. Van der JeughtA. BraduA. G. Podoleanu, “Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit,” J. Biomed. Opt. 15(3), 030511–030513 (2010).JBOPFO1083-3668http://dx.doi.org/10.1117/1.3437078Google Scholar

Y. Watanabeet al., “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49(25), 4756–4762 (2010).APOPAI0003-6935http://dx.doi.org/10.1364/AO.49.004756Google Scholar

S. MoonS. LeeZ. Chen, “Reference spectrum extraction and fixed-pattern noise removal in optical coherence tomography,” Opt. Express 18(24), 24395–24404 (2010).OPEXFF1094-4087http://dx.doi.org/10.1364/OE.18.024395Google Scholar

NVIDIA CUDA Zone, http://www.nvidia.com/object/cuda_home.htm.Google Scholar