Real-time acquisition and display of flow contrast using speckle variance optical coherence tomography in a graphics processing unit

Abstract. In this report, we describe a graphics processing unit (GPU)-accelerated processing platform for real-time acquisition and display of flow contrast images with Fourier domain optical coherence tomography (FDOCT) in mouse and human eyes in vivo. Motion contrast from blood flow is processed using the speckle variance OCT (svOCT) technique, which relies on the acquisition of multiple B-scan frames at the same location and tracking the change of the speckle pattern. Real-time mouse and human retinal imaging using two different custom-built OCT systems with processing and display performed on GPU are presented with an in-depth analysis of performance metrics. The display output included structural OCT data, en face projections of the intensity data, and the svOCT en face projections of retinal microvasculature; these results compare projections with and without speckle variance in the different retinal layers to reveal significant contrast improvements. As a demonstration, videos of real-time svOCT for in vivo human and mouse retinal imaging are included in our results. The capability of performing real-time svOCT imaging of the retinal vasculature may be a useful tool in a clinical environment for monitoring disease-related pathological changes in the microcirculation such as diabetic retinopathy.


Introduction
Optical coherence tomography (OCT) is a well-established noninvasive imaging modality for micrometer-resolution cross-sectional visualization of retinal structures. 1 Specialized extensions, such as visualization of blood flow, have been developed to enable functional imaging of biological tissues. 2 In addition to traditional Doppler OCT imaging, which is sensitive to flow rate, 3 techniques have recently been proposed that highlight tissue in motion, but are insensitive to rate; these include speckle variance OCT (svOCT), 4 phase variance OCT (pvOCT), 5 and optical microangiography. 6Please see the detailed review article on this topic for more information. 7Predominantly, the flow contrast work has been performed in postprocessing.A few notable exceptions have been presented in Refs.8 and 9, where real-time flow contrast was demonstrated during acquisition in two dimensions as well as in three dimensions.For effective volume acquisition of flow contrast data, real-time visualizations of capillary networks via en face projections of vasculature are highly desirable.
In this report, we present GPU-accelerated processing for real-time svOCT acquisition with cross-sectional (B-scan) and en face displays of flow contrast in the retina.We integrated real-time svOCT processing for visualization of mouse retinal vasculature using an 800-nm-range spectral domain OCT (SDOCT) operating at up to 200 kHz.We also incorporated the svOCT processing with a 1060-nm, 100-kHz swept-source OCT (SSOCT) for human retinal imaging, demonstrating the potential for clinical applications in ophthalmology.Last, we introduce the implementation of svOCT GPU in our ongoing open source project. 10

Methods
Real-time flow contrast imaging requires high-speed acquisition and processing.For the SDOCT, linear interpolation was used to perform wavelength to wavenumber resampling, as detailed in Ref. 11.For the SSOCT, the data was acquired linearly in wavenumber using an external k-clock provided by the Axsun source.Physical dispersion compensation was used to match the optical path length for both systems.Only the SDOCT required numerical dispersion compensation, which was implemented on the GPU using the algorithm described in Ref. 12.The software implementations are mostly common to both acquisition systems with only minor differences accounting for the systemlevel controls.For clarification, the term "BM-scan" is used in this article to indicate a set of multiple B-scans acquired at the same location.For speckle variance imaging, each BM-scan consists of three B-scan frames.The algorithm used to compute each speckle variance frame (sv jk ) from the OCT intensity BM-scans (I ijk ) is where i, j, and k are the index of the frame, width, and axial position within the B-scan, respectively, and N is the number of frames per BM-scan. 7In our case, the volume acquisition size was 1024 pixels per A-scan, 300 A-scans per B-scan, and N ¼ 3 for a total of 900 B-scans (300 BM-scans) per volume.

Mouse Imaging
The mouse retinal data was acquired using a custom-built SDOCT system with a superluminescent diode (Superlum Inc., Moscow, Russia) centered at 810 nm and full width at half maximum of 100 nm.The spectrometer (Bioptigen Inc., Durham, North Carolina) was operated at an adjustable line rate of up to 200 kHz for 1024-point A-scans using a CMOS detector (Basler AG, Ahrensburg, Germany).The volume acquisition size for this system was 1024 × 300 × 900 pixels and was acquired in ∼1.5 s.The axial resolution was ∼4.2 μm in tissue, and the lateral resolution was ∼6 μm in the retina using a beam diameter of 0.5 mm at the pupil.Mouse imaging was performed with ethics approval of the University Animal Care Committee at SFU.

Human Imaging
The human imaging was performed with a custom-built 1060-nm SSOCT with a line rate of 100 kHz and using a 500MSPS digitizer (AlazarTech Inc., Pointe-Claire, Québec), with 1024 points per A-scan.The details of this system have previously been reported. 13The axial resolution was ∼6 μm in tissue, and the lateral resolution was ∼17 μm in the retina using a beam diameter of 1.3 mm at the pupil.Retinal images in the foveal region were acquired from a healthy volunteer.The total acquisition time for an entire volume (1024 × 300 × 900) required ∼2.7 s.

Processing
We used our custom GPU program, previously presented in Ref. 11, as the basis for implementing svOCT.For development, we used CUDA Toolkit 5.0 and Microsoft Visual C++ 2008 on a 64-bit Windows 7 operating system.For human imaging, we used a GeForce GTX-680 GPU and an Intel Core i7-3820 CPU.
For mouse imaging, we used a GeForce GTX-Titan and an Intel Core i7-2600k CPU.The difference in hardware was solely due to the component availability at the time the systems were constructed.
Our previous report described the structure of the program and GPU-processing steps for SDOCT and SSOCT, including our approach for batch processing. 11For svOCT processing, we selected a batch size of 30 frames of raw data (10 BMscans), transferred the batch to GPU, executed the Fourier domain OCT (FDOCT) batch-processing pipeline, and launched the speckle variance kernel which batch processed the variance of the speckle intensity for the entire OCT data. 14An en face projection image was extracted from the selected region for visualizing the svOCT data.
Additionally, the GPU was used for real-time display of the images including the svOCT B-scan, the original en face image, and the svOCT en face image.In order to enhance the visualization quality of the blood vessels, a Gaussian filter was implemented to smooth the en face svOCT image.For the target application of retinal imaging in both human and mouse, the program extracts flow contrast data from up to three userselected depth regions, processes an en face projection for each region, and combines all three projections into a superimposed and R/G/B color-coded en face projection.A notch filter and single-pixel rigid registration of the BM-scans were implemented on the GPU to reduce motion artifact in the svOCT image, but was only used for human retinal imaging with a larger field of view (5 × 5 mm 2 ).Details of the complete svOCT implementation can be found in the source code. 10

Results and Discussion
In our previous report, we presented processing rates for the SDOCT-processing pipeline using the Geforce GTX-680 GPU. 11In this article, the later-generation GTX-Titan was used for benchmarking and provided an increase in the SDOCTprocessing rate from 1.1 to 1.9 MHz and the SSOCT-processing rate from 2.2 to 3.2 MHz.
The entire processing pipeline for the spectral domain svOCT was captured in NVIDIA Visual Profiler for a single batch and is shown in Fig. 1.This profiler timeline includes the standard SDOCT kernels [i.e., linear interpolation, DC subtraction, dispersion compensation, fast Fourier transform (FFT), and post-FFT], the speckle variance kernel, and the Gaussian filter kernels.For a B-scan size of 1024 × 300 and a batch size of 30 B-scans, the overall processing and display times take ∼9 ms and equate to an overall processing and display rates of ∼1 MHz.Note from Fig. 1 that approximately one third of the overall processing timeline is used for display; this represents the required time to upload OpenGL textures from the GPU onto the monitor.For matching ultrahigh acquisition rates >1 MHz, a possible method to mitigate this delay is by implementing a multi-GPU solution, where one GPU is dedicated for rendering and display purposes only, whereas the other (s) are dedicated for heavy computational kernels. 15For the SSOCT pipeline, neither k-resampling nor dispersion compensation was implemented; the overall svOCT processing and display rates on the GTX-Titan were ∼1.1 MHz.For image acquisition with a larger field of view, the added processing steps (predominantly the registration) decreased the overall processing and display rates to 460 kHz, which is still beyond the acquisition speed of most OCT systems.
Fig. 1 Representative NVIDIA Visual profiler timeline for a single-svOCT iteration of the SDOCT pipeline (GTX-Titan GPU) is separated into three components: a series of kernels for SDOCT processing, speckle variance kernel with en face rendering kernels, and idle time for display.

Mouse Imaging
The representative images in Fig. 2 were acquired over an area of 1 × 1 mm 2 in a mouse retina with an acquisition time of ∼1.5 s.For each row, the left column displays the crosssectional scan at the location of the red line selected on the en face intensity image in the middle column.The user dynamically selects the regions of interest, and the corresponding intensity and svOCT en face images are generated using sum-voxel or maximum intensity projection.This approach assumes a flat field of view throughout the volume (i.e., negligible curvature).In Figs.2(a), 2(d), and 2(g), the regions of interest were selected from the nerve fiber layer (NFL)/ganglion cell layer (GCL), the inner plexiform layer (IPL), and the outer plexiform layer (OPL), respectively.In Figs.2(j)-2(l), three user-selected regions of interest on the svOCT B-scan are distinguished using color-coded lines, and the color-coded image represents the superimposed svOCT en face projections of all three vascular layers.Comparison of the intensity en face (center column) with the svOCT en face (right column) images reveals a significant contrast improvement for blood vessels with svOCT; for example, capillaries from the NFL/GCL are barely visible in Fig. 2(b) but are clearly distinguishable in Fig. 2(c).To demonstrate the real-time acquisition and display capabilities for spectral domain svOCT in mouse retina, the mouse alignment was adjusted during acquisition in Video 1.The color of the vascular layers in the en face images changes as they pass through the user-selected depth regions.
Fig. 2 The svOCT B-scan images in (a, d, g) illustrate the selected depths of interest for an area of 1 × 1 mm 2 ; the intensity en face images are presented in (b, e, h); and the svOCT images are presented in (c, f, i).The B-scan (j) presents a color-coded combination of (a, d, g), where each color represents: NFL/GCL-blue, IPL-green, and OPL-red.The combined intensity en face image of all depth regions of interest is presented in (k), and the superimposed color-coded svOCT en face image is presented in (l).

Human Imaging
Representative flow contrast images of retina acquired on human volunteers are presented in Fig. 3.The svOCT en face images show well-defined capillary networks in the retina with depth encoded via color-coding.Video 2 is a demonstration of realtime svOCT from a healthy human subject using a four-panel display to show only a single region of interest for clarity.

Conclusion
We have demonstrated real-time flow contrast imaging on two separate systems for human and mouse retinal imaging.This technology has high potential for clinical applications including imaging of retinal angiography.Blood vessels are often seen in OCT intensity images with difficulty, such as in the NFL and IPL; therefore, implementing svOCT enhances the contrast for visualizing both large and small vessels.The simplicity of the svOCT processing lends itself to realtime imaging, where flow contrast is important to study various diseases affecting the retinal vasculature, e.g., diabetic retinopathy, ischemia, etc.Another possible application of this code is to use svOCT to facilitate the alignment to the same location on the retina based on visualization of the blood vessels in longitudinal studies.In addition, more computationally intensive algorithms can be used in postprocessing for retrieving potentially highercontrast images such as the pvOCT technique.
We demonstrated the visualization of the capillary network in human retina with our svOCT over an area of ∼2 × 2 mm 2 .In order to increase the field of view, more A-scans need to be acquired at a faster rate to maintain the same resolution while mitigating motion artifact.For our current system, a simple extension of tiled acquisition and mosaicking of adjacent volumes would also permit acquisition over larger areas. 16ther simple techniques could also be used to limit subject motion such as incorporating a fixation target and a bite bar.
In conclusion, we have demonstrated the overall svOCT processing and en face display rates, using 1024-point Ascans, at up to 1 MHz for SDOCT and 1.1 MHz for SSOCT with GTX-Titan.The microvasculature in the retina is clearly distinguishable with the addition of the speckle variance kernel.The ultrahigh speed processing rates that we have demonstrated provide opportunities to implement GPU-based image-processing algorithms to further enhance the visualization quality for the blood vessels in the en face images.The applications of real-time svOCT are numerous such as monitoring progressive changes to retinal vessels in diabetic retinopathy in ophthalmology and visualizing blood vessel networks in cancer research. 17he GPU used in this research is inexpensive, and the complete svOCT pipeline can easily be integrated into practical FDOCT systems for use in a clinic.The source code that includes transferring interferometric data from the host to the GPU, processing, and displaying of svOCT, is available as a part of our open source project. 10 Fig.2The svOCT B-scan images in (a, d, g) illustrate the selected depths of interest for an area of 1 × 1 mm 2 ; the intensity en face images are presented in (b, e, h); and the svOCT images are presented in (c, f, i).The B-scan (j) presents a color-coded combination of (a, d, g), where each color represents: NFL/GCL-blue, IPL-green, and OPL-red.The combined intensity en face image of all depth regions of interest is presented in (k), and the superimposed color-coded svOCT en face image is presented in (l).Video 1 presents real-time svOCT for mouse (Video 1, QuickTime, 10.3 MB) [URL: http://dx.doi.org/10.1117/1.JBO.19.2.026001.1].The window title bar indicates the batch processing and display rates in frames per second (FPS).