Optical coherence tomography (OCT) is a method of noncontact and noninvasive imaging of the internal structure of objects that are semi-transparent in infrared light. In 2002, Fourier domain optical coherence tomography (FdOCT) was successfully used for in vivo examination of the human eye1 for the first time. Specialized modalities of the FdOCT technique have been developed to enable functional imaging of biological tissue. One of them is Doppler OCT, which enables blood-flow imaging.2
Generally, the quality of in vivo imaging depends strongly on the ergonomics of the OCT system, especially a fast, real-time preview that helps to adjust the instrument and get the best possible results. In the case of Doppler measurements, the problem is a little bit more complicated because the flow velocity range depends on the scanning protocol’s parameters.3 Therefore, we believe that the introduction of the real-time preview of the Doppler signal is necessary to obtain a reliable quantitative signature of the retinal blood flow.
In the case of biomedical imaging, parallel processing on GPUs has been used in computed tomography,4 ultrasonography,5 and optical coherence tomography for structural imaging.6 Utilizing GPU for Doppler OCT imaging of the flow has already been presented but only for a single B-scan in a flow phantom.7 In this report the application of GPU for real-time OCT data processing is revisited and its utilization for blood flow examination in the human eye in vivo is demonstrated. The dynamic changes of the object are visualized as a 3-D “point cloud” (a finite set of points in a geometric space), which evolves in time.
The results presented were obtained with a computer workstation equipped with Intel® Core™ i7 920 (2.67 GHz) CPU, 6 GB RAM memory and a low-cost graphic card: NVIDIA® GeForce® GTX 580 with 3 GB device memory. It was tested with two laboratory-made, high-resolution spectral domain OCT systems. The first, designed for material examination, was equipped with a Superlum D-series Broadlighter D-855 (comprising two coupled superluminescent diodes) with central wavelength and full spectral width at half maximum , providing measured axial resolution (in air). Spectra were collected with a line-scan CCD camera (e2v, AViiVA® SM2). The second OCT system was designed to examine the human retina. A femtosecond laser (Femtolasers, Fusion) was used as a light source (, , ). Spectra were collected with a CMOS camera (Basler, Sprint SPL4096-140 km). The data from both cameras were buffered in frame grabbers (National Instruments PCIe-1429).
The software was written in C++ programming language in Microsoft® Visual Studio 2010. All procedures for parallel processing on GPU were compiled with NVIDIA® CUDA™ compiler version 4.0. The OpenGL® 3.0 Library was used for visualization of results.
The developed software performs full numerical processing of the OCT data. This way, the tomograms are equivalent to those produced by the standard procedures implemented on the CPU. Specifically, the processing includes background subtraction, spectra remapping to wave-number space, numerical dispersion compensation, spectral shaping, and fast Fourier transformation. Finally, a logarithm of modulus of the resulting complex signal is calculated to obtain an A-scan Eq. (1):
In Doppler OCT analysis, phase differences for each corresponding frequency component between each two subsequent A-scans are calculated. The phase difference between the ’th and a consecutive A-scan is given by Eq. (2):21). The change of the phase between and is caused by a very small (less than the axial resolution) movement of reflecting particles. In the case of oversampling and for known time between consecutive acquisitions, the axial component (parallel to the light beam) of the flow velocity can be calculated2 as follows:
The data acquisition, processing and visualization software utilizes two main CPU threads running on the host computer (Fig. 1). These two threads are in detail described below.
The acquisition thread is responsible for collecting the spectra, and therefore works synchronously with the spectrometer camera. Its main task is to pass data from the frame grabber to two alternative buffers in a PC RAM memory. The size of these buffers depends on the size of the acquired data defined in the scanning protocols. Our software supports three protocols: a “cross,” “four slices,” and a “3-D preview.” The “cross” is used mostly for a fast preview and consists of two cross-sectional images (B-scans) collected in perpendicular directions. The “four slices” protocol, consists of four B-scans: three are collected in parallel and the fourth is perpendicular to them. The “3-D preview,” is a raster scan generating a volume data and allows for real-time visualization of the object in 3-D as a “point cloud.”
The main task of the visualization and processing thread is to invoke procedures for parallel data processing on the GPU. All custom-designed procedures are implemented in kernels, which are specific functions running in parallel in many GPU threads. The first kernel performs background subtraction, spectral remapping, numerical dispersion compensation and spectral shaping. Then the fast Fourier transformation is performed with the aid of the CUFFT library delivered by NVIDIA. Finally, the second kernel is put into operation to finalize calculation (all procedures after FFT) and to generate the textures presented on the screen. Several versions of this kernel have been prepared for different scanning protocols and modes of imaging [structural Eq. (1) or Doppler Eq. (2)].
All GPU threads are organized in vectors (blocks). Threads working inside a certain block can exchange data with other threads (in this block) through a fast shared memory. Each block of threads works on a single A-Scan because sharing data is necessary for remapping spectra to wave-number space. For each A-scan, 512 GPU threads are started, so each thread works on four data points of each spectrum. These blocks are organized in a grid (a two-dimensional matrix), with dimension sizes equal to the number of A-scans by the number of B-scans.
The proper design of the parallelism structure of the program is necessary for effective use of GPU. To reduce the demand of each thread for registers the same variables have been utilized for several different purposes. Additionally calling a single precision functions such as instead of double precision . and replacing the time-consuming modulo function with bitwise “and” operation for power of 2 numbers significantly reduces the calculation time. Another important optimization is proper organization of the output matrices (textures for displaying the results) to ensure coalesced memory access what reduces the processing time by 37%.
It must be admitted that the data processing on GPU is necessary but insufficient for real-time 3-D imaging of OCT data. Fast visualization of the results is required. Our method8 utilized 2-D textures presented in three orthogonal directions with enabled transparency. However, without parallel processing, this method was noticeably slow. We managed to reduce the visualization time significantly by application of a pixel buffer object (PBO) for mapping textures to kernels within CUDA architecture.
The overall imaging rate depends on the execution times of three major processes: signal acquisition; data transfer time from PC host memory to the GPU (Table 1, col. 3); and processing and visualization time (Table 1, col. 4). In the most time-effective situation the data acquisition rate should match the rate of the processing and visualization. In such a case, frame rate would depend on the transfer time (Table 1, col. 3) and processing time (Table 1, col. 4), and the overall frame rate would be given by the numbers in Table 1, col. 6. However, at the present level of the development of the technology, data acquisition is slower than data processing, and new data are rarely available for the processing software. Therefore, the same data set can be processed and displayed several times with frame rates given in Table 1, col. 5. This feature is used in our software for online adjustments of data processing (e.g., dispersion compensation) and display parameters (e.g., rotation angle, zoom). When the numerical analysis is taken into account (without transfers) the processing rate on GPU is about 800,000 spectra (2048 points each) per second for structural examination and 550,000 spectra per second in the case of Doppler OCT analysis (Fig. 2).
The performance of the software for structural and Doppler OCT imaging.
|Protocol||Number of A-scans||Data transfer to the GPU||Reprocessing and visualization only||Total frame rate|
|2×1600||3,200||12 ms||8,3 ms||120 fpsa||49 fps|
|2×2000||4,000||13 ms||9,6 ms||103 fps||44 fps|
|4×2000||8,000||29 ms||18 ms||55 fps||21 fps|
|4×5000||20,000||70 ms||43 ms||23 fps||9 fps|
|100×100||10,000||33 ms||75 ms||13 fps||9 fps|
|140×140||19,600||70 ms||117 ms||9 fps||5 fps|
Screen refreshment limit.
To demonstrate the capability of our software to perform real-time structural and Doppler OCT imaging we measured a flow phantom [Fig. 3(a), Video 1] and a retina of a healthy volunteer [Fig. 3(b), Video 2]. In both videos, the grayscale corresponds to the OCT structural information. The color scale codes axial component of the flow. Doppler signal was calibrated using a syringe pump with predetermined flow rate. Video 1 demonstrates real-time 3-D imaging of the bi-directional flow in the phantom (a 200 μm diameter glass capillary with 0.5% water solution of Intralipid®). Data were processed and displayed as a “point cloud” with the frame rate of 6 fps. The Intralipid® flow was controlled manually using a syringe. Video 2 demonstrates an example of the Doppler measurement of the blood flow in the human eye. In order to demonstrate the possibility of real-time targeting for Doppler measurements of the human retina, the “four slices” protocol was adopted. It combines high frame rate with high sampling density necessary for sensitive flow visualization over a area.
In conclusion, parallel GPU computations are very well-suited for OCT data processing. Optimization of the code allows to visualize structural and real-time Doppler information. Moreover, the ascendancy of the processing speed over data acquisition speed allows for processing of the same data set several times in order to optimize processing and/or visualization conditions in real-time. All this seems to open a gate for new applications of the OCT technique, especially if the development of GPU technology continues to be as rapid as at present.
Marcin Sylwestrzak and Danuta Bukowska acknowledges support from the “Step into the Future IV” program co-financed by the European Social Fund and Polish Government. Iwona Gorczynska and Maciej Szkulmowski acknowledges support by the Polish Ministry of Science and Higher Education (years 2009 to 2014). Maciej Wojtkowski acknowledges EURYI grant/award funded by the European Heads of Research Councils (EuroHORCs) together with the European Science Foundation (ESF—EURYI 01/2007PL) operated by the Foundation for Polish Science.