Optical coherence tomography was introduced more than ago for imaging retinal structures.1 Since then, imaging depth only increased marginally, resolution was improved by less than a factor of 10, but imaging speed was boosted by more than three orders of magnitude, from less than 100 to more than 300,000 A-scans per second.2 Instead of taking only still images of anatomical structures, the increased speed of OCT allows us now to image volumes nearly in real time. This enables not only scans of larger tissue surfaces like the esophagus,3 colon,4, 5 or vessel,6 but also opens new applications beyond simple diagnosis. Noncontact volumetric imaging with less than resolution can guide microsurgery at the eye,7, 8 in otolaryngology,9 and in other medical disciplines.10 First attempts for an intrasurgical use of OCT failed mainly because of low imaging speed.7, 8 Full use of OCT during surgical procedures can only be made with rapid 3-D imaging, since otherwise it is difficult to bring the image field in coincidence with the relevant but dynamically changing anatomical structures. High speed OCT imaging by spectrometer-based systems11 and swept-source OCT were demonstrated,12, 13 but data processing and visualization of the data had to be done off-line. However, only displaying measured tissue volumes on-line to the physician can fully exploit the potential of OCT for intrasurgical use. Real-time display of volumetric OCT data also solves the problem of storing the vast amount of data, which are generated by high-speed OCT systems.
We present an ultrahigh speed OCT system integrated into a surgical microscope, which is capable of processing, rendering, and displaying more than seven volumes with 12 million pixel per second by using a PC with a high performance graphics accelerator card. Best performance was reached by distributing the calculation of the A-scans to the four cores of the PC, whereas the preprocessing and rendering was done in real time with dedicated software on a graphic processing unit (GPU).
Material and Methods
OCT data were acquired at a wavelength of through a surgical microscope (MÖLLER Hi-R 1000, Möller-Wedel GmbH, Wedel, Germany) using a fast two-axis galvanometric scanning unit (6210, Cambridge Technology, USA) coupled to the camera port [Fig. 1 ]. The lateral resolution was between 15 and , depending on the magnification used with the surgical microscope. The output of the fiber interferometer was interfaced to a modified commercially available spectral domain OCT (Hyperion, Thorlabs HL AG, Lübock, Germany ), which uses a linear-k spectrometer14 with a fast complementary metal-oxide semiconductor (CMOS) camera (Sprint spL 4096-140k, Basler Vision Technologies, Ahrensburg, Germany ) and an external light source (Broadlighter BS840-B-I-20, Superlum, County Cork, Ireland ). With a spectral band width of , a resolution of approximately was possible in air. The camera has two parallel lines with 4096 rectangular pixels each, which were binned to use the full height of the spectrum. With the full number of pixels, only 70,000 spectra per second could be measured. By reading only a smaller number of pixels, readout speed was increased. With , an A-scan rate of was achieved. Due to the low full-well capacity of the camera pixel, the sensitivity was only . A roll was measured at half of the depth range. When the were used with a binning of two pixels along the spectral axis image (effectively for data evaluation), image quality was significantly improved (sensitivity , roll off ) due to the higher effective full-well capacity, reduced cross talk between the spectral channels, and a slightly improved depth resolution. With the same grating and superluminescent diode (SLD), the image depth and the read-out speed (which was not improved by binning) were both halved. Preprocessing, fast Fourier transform (FFTW library15), and postprocessing of the A-scans were done in parallel on all four cores of a Intel Core-2-Quad or Xeon processor. For 3-D rendering and display, the volumetric OCT data were transferred to a high-performance video card (NVidia GTX280 with of memory), where a dedicated software was used for 3-D visualization of the data. All timing-critical software was written in C++ on the Microsoft Windows operating system using DirectX and the NVidia CUDA framework.
Ray tracing (volume ray casting), and (3-D) texture mapping are the two options for rendering volumetric data. 16, 17 Due to the better general performance on modern video cards, texture mapping was used here, which projects a 2-D image onto the surface of a 2- or 3-D object. Modern graphics accelerators are highly optimized for work with surfaces and textures in three dimensions. As texture mapping works only with surfaces and here volume elements (voxels) have to be displayed, stacks of planes were defined that sliced through the volumetric data [Fig. 1]. Transparency was assigned to the textures on the planes to allow looking into the volume. The opacity of the pixel on the textures was calculated from the intensity of the OCT data voxels, which were crossed by the texture planes by a simple windowing algorithm: outside a user-chosen intensity range, all pixels were set either completely transparent or opaque white, depending on the corresponding OCT value below or above the windowing range. Within this range, the opacity was scaled linearly with the OCT signal.
Rendering by the GPU was considerably faster than rendering by the CPU on the main board. A comparison showed more than 30 times increased speed when compared to one core; the data throughput was instead of . For the sake of faster processing, only stacks of planes aligned perpendicular either to the , , or axis were calculated. The stack that was nearest aligned to the viewing direction was chosen to prevent the user from looking through the space between the planes. With this approach, the texture-carrying planes were not aligned perpendicular to the viewing direction all the time, which produces small artifacts especially when the chosen stack chances abruptly by a chance of the viewing direction. But with typical OCT data, this simple algorithm produced no disturbing artifacts.
For on-line visualization, the CMOS camera was continuously read out at 215,000 spectra per second. Complete volumes with 80 B-scans, each B-scan consisting of 380 A-scans with each, were continuously acquired in (Fig. 2 ). Due to the fly-back time of the galvanometric scanners, only 300 A-scans could be used for further processing. The readout of one volume and calculation of the A-scans from the spectra took . Since each pixel was represented as a floating point number, 12 million bytes had to be transferred to the video card, which was accomplished in less than . Rendering itself took less than , and a “clean-up” in less than completed the cycle. Only for the 3-D rendering was the throughput around , including the data transfer between the main board and video card. Complete serial processing, including the calculation of the FFT and rendering of one volume, was done in , which was slightly shorter than the needed to read out the camera. The acquisition speed of the camera was therefore the speed-limiting factor.
With the current size of the spectra (1024 elements), the FFT was calculated faster on the CPU than on the GPU. Starting at per spectra, a higher performance of the NVidia CuFFT library running on the GPU is expected.18 If necessary, a further reduction of the cycle times and an increase of data throughput could be achieved by parallelizing the calculation of the A-scans on the CPU with the rendering by the GPU, which were done in this work sequentially.
Image quality was sufficient for the visualization of structures in scattering tissues like sweat glands in the finger tip [Fig. 3 ]. It was rather limited by the OCT setup, including the CMOS camera, than the processing of the data. Binning considerably improved the visual impression, the signal-to-noise ratio (SNR), and roll-off [Fig. 3], but reduced the pixel rate of the camera to approximately 50%. The possibility of tissue manipulation under OCT was tested by cutting an onion and removing different onion skin layers (Video 1 ).10.1117/1.3314898.1
Discussion and Conclusion
Transfer of the OCT data to the GPU and volumetric texture mapping contributed only a small fraction to overall processing time, and only 20% of the cycle time was actually used for 3-D rendering. Therefore, there is still processing capacity for additional steps in the rendering process like averaging, color coding, or the calculation of virtual B-scans. The speed of the OCT system was already compromised by the insufficient speed of the galvanometric scanners. More than 20% of the pixels are lost during the fly-back. Even faster OCT systems will require a new architecture for the lateral beam deflection, e.g., bidirectional scanning, resonance, or microelectromechanical systems (MEMS) scanners.
For intersurgical work, image quality and speed seem to be sufficient. However, optimization of the optical setup and binning of pixels will further increase image quality. During a surgical procedure, tissue movements in , , and directions will still be a challenge for our system due to the unavoidable roll-off and the limited number of lateral voxels . Automatic z-tracking of the reference delay line and tracking of lateral movements of the tissue or the instruments will be implemented in future versions of the OCT surgical microscope to improve applicability.
This work was supported by the local government of Schleswig-Holstein, Germany (HWT 2007-14 H) and the European Union research program FP7-HEALTH-2007-A (201880 FUN OCT).