Graphics processing unit accelerated intensity-based optical coherence tomography angiography using differential frames with real-time motion correction

Abstract. We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel–shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.


Introduction
Optical coherence tomography (OCT) is a noncontact, noninvasive imaging modality that can provide high-resolution, depthresolved imaging of internal microstructures within the tissue. 1 The development of Fourier domain optical coherence tomography, which provides markedly increased imaging speeds and high sensitivity, 2,3 has enabled in vivo imaging of blood flow.Recent technological advances in OCT-based flow imaging techniques can be grouped into two categories: Doppler OCT and OCT angiography.5][6][7][8] The power of Doppler shift images and phase variance images can be used to construct OCT angiograms for vasculature visualization. 9,10For bulk-tissuemotion (BTM) on the order of a wavelength, which is much smaller than the axial resolution, the effect of motion appears as a phase shift.A number of methods for bulk-phase correction have been proposed. 7,102][13][14][15][16][17] These methods are insensitive to bulk-phase changes, and therefore, do not require bulk-motion phase correction. 146][17] IF speckle variance images were calculated across three structural OCT images of nude mice. 12The vasculature network in human skin was imaged using a four-frame IF-IB Doppler variance (DV) method. 16As the number of frames increases, the signal-tonoise ratio (SNR) improves; however, this approach also increases the sensitivity of these methods to bulk motion (on the order of a pixel).Therefore, IF methods are suitable for imaging fixed samples, or for use with higher frame rates; otherwise, the method requires BTM correction on the order of a pixel.
Real-time display of tissue structure and flow information is always desirable for clinical applications that require immediate diagnosis for screening or biopsy/surgery.][20][21][22][23] The GPU has many processors that provide highly parallel computing at a low cost, with the advantage of simple programming on a host computer.GPU processing is capable of displaying both structural OCT and Doppler OCT images in real time (e.g., Doppler OCT using the Kasai autocorrelation algorithm, 24 speckle variance with BTM correction, 25 and phase-resolved four-dimensional Doppler OCT 26 ).
In this paper, we demonstrated GPU-accelerated IB-OCT angiography, using two sequential frames with a simple BTM correction.The squared difference between two sequential amplitudes of the OCT images was used to minimize motion artifacts.We chose the BTM-corrected image that minimized the sum of the pixel values, from a total of 25 calculated OCT angiographic images, using axial-and lateral-pixel-shifted structural OCT images.A GPU processing rate of more than 100;000 lines∕s exceeded the line rate of the camera (46.9 kHz).Thus, our OCT system could display structural OCT images and BTM-corrected OCT angiography in real time.

Experimental Setup
Figure 1 shows a schematic diagram of our FD-OCT system.A superluminescent diode (central wavelength: 1330 nm; spectral bandwidth: 58 nm) was used to illuminate the system.A polarization-independent fiber isolator was placed immediately after the light source to avoid light reflection into the source.The light was split into sample and reference beams, with the latter being terminated by a mirror using a 90∕10 fiber coupler.
Address all correspondence to: Yuuki Watanabe, Yamagata University, Bio-systems Engineering, Graduate School of Science and Engineering, 4-3-16 Johnan, Yonezawa, Yamagata 992-8510, Japan.Tel: 81 238 26 3292; Fax: 81 238 26 3292; E-mail: ywata@yz.yamagata-u.ac.jpAchromatic lenses (f ¼ 45 mm) were inserted into both beam trajectories to compensate for the dispersion mismatch.Light returning from the reference mirror and sample were recombined in the fiber coupler.The output interference spectra were detected by a custom-built spectrometer, comprising a 35-mm focal-length achromatic collimating lens, a 1145-lines/mm transmission grating (Wasatch Photonics, Logan, Utah), and a 100-mm focal-length achromatic lens.The spectra were focused onto a line-scan InGaAs camera (Goodrich-Sensors Unlimited, 1024 pixels, 14-bit resolution) operated at 46.9 kHz.The camera output was fed to a personal computer (PC) via a low-cost camera-link board (National Instruments, Austin, Texas, PCI-1426, 16-bit resolution).
Signal processing was performed using a graphics card (NVidia, Santa Clara, California, GeForce GTX680, 1006-MHz processor clock, 1536 stream processors, and 2048 Mbytes GDDR5 memory).This graphics card was connected to the host computer via a PCI-Express 3.0 × 16 interface, which had twice the data transfer rate compared with the previous work. 20We used NVidia's compute unified device architecture (CUDA) Version 4.2, 27 which could be programmed in C language to exploit the processing power of the GPUs.We developed software that included image acquisition, GPU programming, and a graphical user interface environment in the Microsoft Visual C++2008 Express Edition.

OCT Angiography Using a GPU
Multiple frames-based OCT angiography is susceptible to BTM and requires an increase in the number of acquired frames to improve the SNR.To minimize motion artifacts, we used two sequential frames and averaged the amplitudes in both the lateral and depth directions.The algorithm that we developed for this purpose is given below by Eq. (1): where A i;j;k is the amplitude of the OCT image and i, j, and k are indices for the frame, transverse, and depth pixels, respectively.K is the number of averaged depth pixels and J is the number of averaged A-lines.Pixel-shifted images were used to correct for BTM.This correction is accounted for in the modified version of Eq. ( 1), given below in Eq. ( 2): where p and q are the pixel translations of the lateral and depth directions, respectively.Because we could not predict the direction of tissue motion, we calculated the squared difference of two OCT images by changing p and q, to find the shift values that minimized the sum of the pixel values.Generally, larger J and K values for the averaging will increase image SNR; however, it also increases the computation time and results in a "blurring" and loss of smaller vessels.The larger p and q values for the image registration will increase the computation time.To display BTM-corrected OCT angiographic images in real time, the averaging procedure used J ¼ 3 and K ¼ 3; the parameters for image registration were p ¼ AE2 and q ¼ AE2 in this work.
Figure 2 shows the flow chart for the GPU programming process.On the host PC, the captured spectral interference image (Nz ¼ 1024: spectral size × Nx: lateral points, 16 bit) was transferred to the memory on the GPU.The data transfer rate between the host PC and the GPU device was increased using GPU paged memory.On the GPU, the data type of the spectral interference signal was first converted from a 16-bit integer to a 32-bit complex floating-point and then the spectrum was subtracted by the averaged spectrum.Next, zero-filling interpolation was performed with a forward Fourier transform, zero padded to increase the data-array length 2-fold (M ¼ 2). 19he data were then inverse Fourier transformed back to the spectral domain and linear interpolated into k-space.Here, only the constant values of linear interpolation were transferred to the GPU memory.The real and imaginary parts of the complex data were set to the processed data and zero, respectively.Next, the MNz-point FFT was executed for Nx lateral points.An amplitude image with Nz∕2 × Nx pixels was obtained.If this amplitude was larger than the threshold value that is 1 dB above the OCT intensity noise floor, then OCT angiography processing was performed using the previous amplitude of the OCT image.Because p ¼ AE2 and q ¼ AE2 were used for the BTM correction, a total of 25 OCT angiographic images were processed.The BTM-corrected OCT angiographic image had  the smallest total of pixel values.This image and the log-scaled OCT image were transferred to the host computer memory and displayed on the monitor.To save data transfer time, the data were then converted from 32-bit floating-point numbers to an 8-bit gray scale.The amplitude OCT image used was then copied to the GPU memory for the next processing cycle.

GPU Processing Rate
We measured the 100 times average GPU time, which included data transfers to and from the GPU memory for a given number of A-lines.The GPU processing line rates were then calculated using the number of A-lines divided by the measured processing time, as shown in Fig. 3. Here, the processing of OCT angiography was carried out on only a 25% portion of the amplitude OCT image by a threshold.A GPU processing rate of 100 K lines∕s exceeded the line rate of the camera used.Therefore, structural OCT and OCT angiography can be obtained in real time.Because the processing time on the GPU had a wide margin in terms of the frame interval time in our OCT system, it is possible to perform BTM correction with subpixel structural image registration.
Figure 4 shows the processing time and data transfer time for input data of 1024 spectral size × 1024 A-lines, using a CUDA visual profiler. 27The sum of the data transfer time and the processing time associated with OCT angiography was 3.8 ms, which accounted for 73% of the total processing time (5.2 ms). Lee et.al. performed image registration using a partial region (256 × 256 pixels) of a structural OCT image (512 × 512 pixels). 25If the position of the sample is known in the image, then the GPU processing time can be reduced by calculating the partial image registration and then the angiographic image in all regions.The data transfer time of one processed image (512 × 1024 pixels, 8 bit) was 42 μs, and accounted for only 1.9% of the frame acquisition time (21.8 ms ¼ 21.3 μs line interval × 1024 A-lines).Recent developments in GPU-connected PCI-Express 3.0 interface technology allowed several images to be displayed in real time for multifunctional OCT techniques. 28

In vivo imaging
First, we obtained the OCT angiographic images without the BTM correction using two frames and multiple frames to clarify the influences of the BTM.As one of the multiple frames-based OCT angiography algorithms, we chose the IF-IBDV method 16 in this work.It would be similar in results to other algorithms such as speckle variance.We measured the nail-fold region of a human finger in vivo at the same location using a scanning galvano mirror at 44 Hz. Figure 5(a) shows the resulting structural OCT images (1024 × 512 pixels).Here, the imaging range was 4.8 × 3.4 mm 2 .We calculated OCT angiographic images without BTM correction using Eq. ( 1), as shown in Fig. 5(b), 5(d), and 5(f).For comparison, Fig. 5(c), 5(e), and 5(g) show IF-IBDV images using four frames. 16When the sample was stable, both algorithms could be used to view the blood vessels [Fig.5 Next, we performed the correction on the order of a pixel to compensate the BTM. Figure 6 and Video 1 show the processed 25 OCT angiographic images from Eq. ( 2) to clarify the availability of the BTM correction.The solid square in Fig. 6 corresponds to the OCT angiogram in Fig. 5(d).When the lateral and depth pixel shifts were p ¼ −1 and q ¼ þ1, respectively, the total pixel values were minimized, and the blood vessels were identified (see the dashed square in Fig. 6).Therefore, the squared difference between two frames with BTM correction (on the order of a pixel) is effective for IF-OCT angiography.The video data (Video 1), which consist of 50 images, were shown at the display rate of 44 fps.In real-time processing     The small blood vessels could not be identified due to frame averaging.To perform OCT angiography using multiple frames, the 3-D data set must be obtained by driving the slow-axis scanner with a multiple-step function, which allows several OCT images to be averaged at the same position. 17Moreover, all the frames required BTM correction on the order of a pixel.The speckle variance OCT angiography with a different BTM correction method using a different GPU has already been demonstrated. 25In our next work, we are planning to compare various IB-OCT angiography algorithms and BTM correction methods on a GPU.

Conclusion
We demonstrated GPU-accelerated IB-OCT angiography by calculating the squared difference between two sequential frames with the simple BTM correction on the order of a pixel.Although the influence of a motion appeared in subsequent OCT angiographic images using multiple frames because of frame averaging, the two frames-based OCT angiography can reduce this influence.Our correction was performed by minimizing the sum of the pixel values using axial-and lateralpixel-shifted structural OCT images.The BTM-corrected image was extracted from a total of 25 calculated OCT angiographic images.A GPU processing rate of more than 100 K lines∕s was achieved, which exceeded the line rate of the camera (46.9 kHz).We demonstrated that our OCT system was capable of displaying structural OCT images and BTMcorrected OCT angiography in real time.

Fig. 2
Fig. 2 Flow chart of GPU programming of FD-OCT and intensity-based OCT angiography using two frames.
(b) and 5(c)].When the sample moved, both images had high pixel values in the stationary tissue region [Fig.5(d) and 5(e)].Although the influence of this motion appeared in subsequent IF-IBDV images [Fig.5(g)] because of frame averaging, we were able to minimize this motion in our OCT angiographic images [Fig.5(e)].

Fig. 3
Fig.3GPU processing rates of intensity-based OCT angiography using two frames with bulk-tissue-motion correction.

Fig. 4
Fig.4Each processing time and data transfer time for input data of 1024 spectral size×1024 A-lines.memcpyHtoD and memcpyDtoH are data transfer to and from the GPU, respectively.OCTA is the processing of OCT angiography using Eq.(2).Min_kernel performs the selection of the image having the smallest total of pixel values.Copy_kernel is the copy procedure of the amplitude OCT image to use the next processing cycle.

Figure 7 (
c) and Video 2 show the maximum intensity projection (MIP) view and en face images of the 3D-OCT angiographic data set, respectively.The vasculature network can be clearly seen.Figure7(d)shows the MIP image obtained from 3-D IF-IBDV images.16