Design and characterization of a handheld multimodal imaging device for the assessment of oral epithelial lesions

Abstract. A compact handpiece combining high resolution fluorescence (HRF) imaging with optical coherence tomography (OCT) was developed to provide real-time assessment of oral lesions. This multimodal imaging device simultaneously captures coregistered en face images with subcellular detail alongside cross-sectional images of tissue microstructure. The HRF imaging acquires a 712×594  μm2 field-of-view at the sample with a spatial resolution of 3.5  μm. The OCT images were acquired to a depth of 1.5 mm with axial and lateral resolutions of 9.3 and 8.0  μm, respectively. HRF and OCT images are simultaneously displayed at 25 fps. The handheld device was used to image a healthy volunteer, demonstrating the potential for in vivo assessment of the epithelial surface for dysplastic and neoplastic changes at the cellular level, while simultaneously evaluating submucosal involvement. We anticipate potential applications in real-time assessment of oral lesions for improved surveillance and surgical guidance.


Introduction
Oral cancer is currently one of the top 10 most prevalent cancers worldwide and it is estimated that the incidence will reach 15 million new cases annually by the year 2020. 1,2 Although treatment methods have improved in recent years, 5-year survival rates have remained fairly constant, at only 62% in the US, largely due to high rates of late stage diagnoses 3 and local relapses. 4,5 Current screening methods involve visual inspection and manual palpation by a trained clinician. However, the qualitative nature of these methods leads to poor diagnostic accuracy. 6 Biopsy and histology of tissue from regions identified during screening is the gold standard for diagnosis, but this procedure is time consuming and painful for the patient. 6 High rates of multiple primary lesions make adequate screening crucial so that all potential lesions are identified while also minimizing the number of unnecessary biopsies. 6,7 With local recurrence estimated to account for up to 20% of cases, the ability to obtain negative surgical margins while minimizing the removal of surrounding normal tissue is also critical. 4 Frozen section histopathology allows for intraoperative guidance on the margin status, but increases the cost and the duration of procedures and provides diagnostic information at only a few discrete locations. The long-term goal of this work is to develop an instrument for noninvasive, real-time assessment of the oral mucosa and to provide a more targeted approach for biopsy collection as well as real-time margin assessment during surgery, potentially decreasing rates of both missed diagnoses and local relapses.
Several optical imaging modalities, such as confocal microscopy, [8][9][10][11][12] high resolution microendoscopy (HRME), [13][14][15] and optical coherence tomography (OCT), [16][17][18][19][20][21][22] have been proposed as methods for noninvasive "optical biopsy" to improve the accuracy of oral cancer screening. Reflectance and fluorescence confocal microscopy have demonstrated the ability to provide subcellular resolution of optically sectioned images within the epithelial layer due to native tissue contrast or when used with topically applied or intravenous contrast agents. [8][9][10][11][12] However, these systems are expensive and typically involve custom-designed high numerical aperture optics, which are challenging to miniaturize. HRME is a simple, low-cost approach which can also provide subcellular resolution en face images of epithelial tissue, albeit without intrinsic optical sectioning. The distal end of a coherent fiber-optic bundle is placed in contact with the tissue in order to capture fluorescent emissions from an exogenous contrast agent. 14 HRME then images the proximal face of the bundle onto a charge-coupled device (CCD) camera as in epi-fluorescence microscopy. When used with proflavine, a topically applied contrast agent that preferentially stains cell nuclei, morphologic parameters including the nuclear-to-cytoplasm ratio can be quantified and used to determine the disease state of oral epithelial tissues with good sensitivity and specificity. 14,15 The limitations of HRME include a spatial resolution, which is determined by the size of the individual cores within the fiber bundle. Additionally, HRME can only image the first few cell layers within the epithelium and is, therefore, unable to assess abnormal lesions for any potential submucosal involvement. 23,24 OCT, an optical imaging modality analogous to ultrasound, uses the interference of backscattered near-infrared (NIR) light to image tissue microstructures in cross section at depths of up to 1 to 2 mm. 25,26 Contrast in OCT arises from the native optical properties of the tissue, providing the ability to distinguish tissue layers and potential disease states. By measuring factors such as the epithelial thickness [16][17][18][19][20][21] and vasculature, 22 OCT has demonstrated potential as a useful aid in the diagnosis of oral cancer. However, while OCT can provide assessment of the entire epithelium and superficial stroma, spatial resolution at the 10 to 20-μm scale means that individual cells cannot be resolved. [16][17][18][19][20][21][22] Although both HRME and OCT have been individually used to distinguish dysplastic from healthy oral mucosa, [13][14][15][16][17][18][19][20][21][22] the combination of these complementary imaging modalities may overcome the practical limitations of each system alone. The use of these complementary imaging modalities may aid the clinician in the evaluation of oral lesions during both diagnosis and resection. Potential lesions initially identified through visual examination can be evaluated by HRF imaging to determine the degree of dysplasia. If classified as abnormal by HRF, submucosal involvement can then be assessed using OCT. A method combining OCT with HRME, a simple high resolution en face imaging modality, was first described by Wall and Barton 27 for the diagnosis of colon cancer. This system uses multiple optical fibers evenly spaced around a central fiber bundle in order to capture sequential OCT and HRME images. This design requires the use of complex distal end optics and limits HRME resolution to the size of the fiber bundle cores. 27 The use of similar systems, combining OCT with complementary high-resolution imaging modalities such as confocal, twophoton, fluorescence lifetime, and photoacoustic imaging, has recently shown potential for minimally enhanced invasive assessment of epithelial and endothelial lesions. [28][29][30][31][32][33][34][35][36][37] In this paper, we report the optical design and the experimental performance of a compact, handheld instrument for multimodal optical imaging which combines cellular resolution en face imaging at the epithelial surface with cross-sectional imaging of the tissue microstructure to the submucosal layer. The high resolution fluorescence (HRF) imaging component described in this paper builds on the previously described HRME system. 13 Here, we image the tissue in free-space rather than through a fiber bundle, which (i) allows coregistered, simultaneous imaging with OCT, and (ii) eliminates the fiber bundle pixelation effect, increasing spatial resolution without sacrificing field-of-view. The design uses only six off-the-shelf lenses in a compact and cost-effective approach that made it ideal for clinical imaging.

Optical Design
A schematic of the multimodal imaging system can be seen in Fig. 1(a). This handpiece is located within the sample arm of a spectral domain OCT (SD-OCT) system. The SD-OCT component was based on a system previously described by Yun et al. 38 Briefly, the OCT system uses a 100-nm bandwidth (FWHM) superluminescent diode source centered at 1325 nm (Thorlabs, Newton, New Jersey). Light from the source is split into the sample and the reference arms by a fiber-optic beamsplitter, so that 90% of the light travels to the sample arm, as described in detail below, illuminating the sample with 1.8 mW. The remaining 10% of the light is directed to a stationary reference arm. Recombined light is directed to a custom built spectrometer where it is dispersed by a 1200-l∕mm transmission diffraction grating (Wasatch, Logan, Utah) and focused onto a 1024 element InGaAs line scan camera (SUI Goodrich, Princeton, New Jersey). The parameters of the OCT light source and the spectrometer provide a theoretical axial resolution and an imaging depth of 8.0 μm and 3.0 mm, respectively. This system acquires data at a rate of 25;500 A-lines∕s, which corresponds to 25.3 fps with a sample overlap of 6.0 μm. Using ZEMAX optical system design software, the handheld OCT sample arm was designed for clinical imaging specifically within the oral cavity. The HRF system was incorporated within the handheld OCT sample arm, providing inherently coregistered and simultaneous imaging. The HRF component of the system is designed for use with proflavine, a fluorescent dye with peak excitation and emission at 445 and 515 nm, respectively. 13 Excitation wavelengths are isolated from a 455 nm LED (Thorlabs) using an excitation filter with a 45-nm bandwidth centered at 445 nm (Semrock, Rochester, New York). This light is collected using a 10-mm focal length aspheric condenser lens (L2, Thorlabs). A long-pass dichroic mirror (D2) with a 495-nm cut-on (Chroma, Bellows Falls, Vermont) reflects the excitation beam toward the sample. This beam passes through a 950-nm cut-off short-pass dichroic (D1, Semrock), where it is combined with the collimated OCT beam. This short-pass dichroic, mounted on a galvanometer, scans the reflected NIR OCT beam while transmitting the unaffected visible HRF excitation and emission light. The coregistered OCT and HRF beams then travel to the sample along a common path through a pair of 12.7-mm diameter, 30-mm focal length achromatic relay lenses (L3 and L4, Thorlabs). The final lens in the sequence is a 5-mm diameter, 10-mm focal length achromatic doublet (L5, Thorlabs). The lens sequence L3-L5 is designed to simultaneously focus the OCT beam onto the sample, while providing uniform (Köhler) illumination of the region with HRF excitation light. A glass window is fixed at the sample plane to ensure the tissue is flat and to increase imaging stability. The backscattered OCT signal and fluorescent emissions from the proflavine travel back along their original paths and are separated by the short-pass dichroic mirror (D1), which de-scans the OCT beam. Proflavine emissions are isolated from the HRF excitation light by the long-pass dichroic (D2) and a separate emission filter with an 88-nm bandwidth centered at 550 nm (Semrock). The fluorescent emission is focused onto a compact (29 × 29 × 30 mm 3 ), 1288 × 728 pixel CCD camera (Point Grey, BFLY-PGE-09S2M-CS, Richmond, British Columbia, Canada) using a 50-mm focal length achromatic doublet (L6, Thorlabs). The CCD camera's exposure time was set to match the OCT frame rate for simultaneous image acquisition in order to minimize HRF image flicker due to the scanning OCT dichroic mirror (D1).
A rendering of the OCT-HRF handpiece, drawn to scale, can be seen in Fig. 1(b). This model shows the HRF component of the system integrated within the sample arm of the OCT system. The pair of relay lenses previously mentioned (L3 and L4), acts to extend the imaging arm in order to improve ease of clinical imaging. The segment containing L3-L5 was designed to be positioned within the oral cavity. It is 12.6-cm long and tapers to a 5-mm tip for access to confined tissue sites. The maximum dimensions of the handpiece are 29-cm long × 7-cm wide × 6-cm high and the instrument weighs 320 g.

Performance Characterization
Standard targets were imaged in order to evaluate the imaging performance of both OCT and HRF imaging with the handpiece.
The axial resolution of the OCT system was determined by measuring the intensity profile for a silver mirror scanned through the focus. The OCT lateral resolution was measured by scanning across the reflective elements of a high resolution USAF target (Edmund Optics, Barrington, New Jersey). The HRF imaging parameters were evaluated at the center (on-axis) and at the edge of the field (full-field). For these measurements, "fullfield" was defined as the point on the x-axis at the edge of the field-of-view (356 μm from the center of the object). A high resolution USAF target (Edmund Optics) was imaged to determine the experimental resolution of the HRF imaging system across the entire field-of-view. Distortion was evaluated using a 100-μm-square grid target (Thorlabs). A target with Ronchi rulings at varying spatial frequencies (Thorlabs) was used to measure the experimental modulation transfer function (MTF) of the HRF system.
One healthy volunteer was imaged as a proof-of-concept experiment for obtaining simultaneous, coregistered OCT and HRF images of oral mucosa in vivo. Imaging was performed under an Institutional Review Board-approved protocol. As used in the previous studies, 14,15 a 0.01% w/v solution of proflavine in sterile phosphate buffered saline was topically applied to a small region of tissue within the oral cavity. Immediately following application, the distal tip of the handpiece was placed on the tissue for imaging. Simultaneous, coregistered OCT and HRF images were acquired, processed, and displayed in real time using a custom LabView program.

Results
In comparison to the theoretical OCT axial resolution of 8.0 μm (based on source spectral properties), an axial resolution of 9.3 μm was measured. Assuming a refractive index of 1.4 in tissue, this corresponds to an axial resolution of 6.6 μm. The sample arm optics provide an OCT numerical aperture of 0.105, which results in a theoretical lateral resolution of 8.0 μm and a Rayleigh parameter of 76 μm. The lateral resolution was measured to be 12.4 μm.

Figures 2(a) and 2(b)
show the HRF geometrical spot diagrams at the image (camera) plane (IMA) for the on-axis and full-field points, respectively, displayed with the diffractionlimited spot size shown as a black circle. The on-axis imaging point is diffraction limited (8.7 μm at the camera) while the rays at the edge of the field focus to a root mean square (RMS) spot size of 27.8 μm. Accounting for the 5× magnification from object to image, this corresponds to a resolution at the sample of 1.7 μm on-axis and 5.6 μm at the edge of the field. These theoretical predictions are supported by the experimental results seen in Fig. 2(c). The image of the high resolution USAF resolution target [ Fig. 2(c)] shows that the system is able to clearly resolve group 8 element 2 (with 26.4% contrast), corresponding to an on-axis resolution of 3.5 μm. Figure 3 shows plots of the theoretical (a) field curvature and (b) distortion from the center of the image to the edge of the field. The maximum field curvature is 1.1 mm. The distortion of the image, which increases on moving toward the edge of the field, does not exceed 0.20%. Corresponding experimental results are demonstrated in an image of a 100-μm-square grid target [ Fig. 3(c)]. The image is well focused at the center, but slightly out of focus toward the edges. The system distortion calculated from this image is 0.29% at the edge of the field, again in good agreement with theoretical predictions. Figure 4 compares the theoretical and the experimental MTF curves for points (a) on-axis and (b) at the edge of the field (356 μm from the center of the object). The 26.4% MTF contrast value, indicating a limiting resolution based on the Rayleigh criterion, occurs at 462.4 cycles∕mm. Loss of resolution was calculated for the experimental results by interpolating between the measured spatial frequencies. Experimentally, the on-axis tangential and the sagittal components differ slightly. The tangential component shows a resolution of 317.1 cycles∕mm and the sagittal component shows a resolution of 308.0 cycles∕mm, both at the 26.4% contrast level. At the edge of the field [ Fig. 4(b)], the theoretical tangential and the sagittal components are distinct. The theoretical resolution at Rayleigh's criterion occurs at 258.4 cycles∕mm for the tangential component and 75.4 cycles∕mm for the sagittal component. Experimentally, resolution figures at 26.4% contrast for the tangential and the sagittal components were measured to be 231.8 and 157.4 cycles∕mm, respectively. The Strehl ratio, calculated by taking the ratio of the areas under the MTF and the diffraction limited MTF, is a single metric used to evaluate imaging performance. In comparison to the theoretical on-axis Strehl ratio (0.81), a Strehl ratio of 0.63 was calculated for the on-axis experimental data. Figure 5 shows a video capture of real-time in vivo (a) OCT and (b) HRF imaging of a volunteer's oral mucosa, in the XZ and XY planes, respectively. The epithelium, the basement membrane, and the lamina propria are visible in the OCT image. In the HRF image, the fluorescently labeled nuclei can be distinguished from the unstained cytoplasm. The dashed line in (b) shows the intersection of the OCT scan within the HRF image.

Discussion
A multimodal system capable of simultaneously capturing images of cellular level detail and subsurface tissue microstructure has been developed. The combination of OCT and HRF beams at a scanning dichroic mirror provides a common optical path and inherent spatial coregistration. The dual beam-shaping requirements for high-resolution wide field imaging for HRF in tandem with focused point scanning for OCT were satisfied by the use of only six off-the-shelf lenses, allowing a compact, lowcost instrument to be assembled.
The complementary nature of these modalities makes them well suited for combination in clinical applications. The HRF imaging provides en face images of cellular level detail at the epithelial surface, but can only assess the most superficial cell layers. 23,24 In contrast, OCT allows for cross-sectional imaging of tissue microstructure at depths up to a few millimeters. 38 Isolation of each signal for measurement is feasible because the visible HRF wavelengths are spectrally distinct from the NIR light used for OCT.
The HRF component described here builds upon recent work on HRME in the oral cavity, which showed that the dysplastic lesions can be accurately identified in images of proflavinestained tissue acquired through a fiber-optic bundle. 14,15 Here, by eliminating the pixelating fiber bundle, the HRF system achieves higher spatial resolution than HRME (and most confocal platforms) while maintaining the field-of-view. Wall and Barton 27 report a resolution of 8 μm for their HRME component. In contrast, the free-space HRF system described here can resolve 3.5-μm structures. Images of standard targets indicated there is some falloff of resolution toward the edge of the HRF field [ Fig. 3(c)]. Since distortion in the system is minimal, the falloff is likely due to field curvature, a result of illuminating the sample using a single lens rather than a multielement objective. As noted by other developers of in vivo high resolution imaging systems, a slightly curved field is not a major concern when imaging a thick tissue sample. 39 Even with this falloff, a 5.7-μm resolution was achieved at the field edge.
With a 5× HRF system magnification, the 5.26 × 2.97-mm 2 CCD sensor views a 1051 × 594-μm 2 field at the sample. However, the use of a single-achromatic doublet (Fig. 1, L5) instead of a well-corrected objective lens affects illumination uniformity and image quality, especially at the field edges. Because the illumination across the entire field-of-view was not perfectly uniform, the imaged field was cropped on the left and the right sides at the location where the illumination was halfway between the maximum and the minimum pixel values, corresponding to a 712 × 594-μm 2 field-of-view.
Although high numerical aperture is desirable for HRF imaging, a low numerical aperture is required for OCT, where it is important to match the optical depth of focus to the imaging depth range provided by the spectrometer. By filling the clear aperture of the focusing lens (Fig. 1, L5) with the HRF excitation beam and underfilling with the OCT beam, we were able to achieve a HRF numerical aperture of 0.200 and an OCT numerical aperture of 0.105. Compared to traditional OCT systems, a numerical aperture of 0.105 is fairly high and results in a shorter depth of focus. In air, this produces a useful imaging depth of about 1.5 mm instead of the theoretical 3.0 mm. A smaller numerical aperture and larger depth of focus could be achieved by increasing the focal length of the last lens (Fig. 1, L5) in the sample arm. This, however, would decrease the numerical aperture available for HRF imaging, resulting in lower magnification and reduced resolution across the HRF image. The lens selected (L5) provides a compromise between HRF imaging resolution and OCT imaging depth. The epithelium in oral mucosa is generally 90 to 300-μm thick. 40 Therefore, a 1.5-mm OCT imaging depth remains capable of imaging the entire epithelium, basement membrane, and into the lamina propria, sufficient for initial assessment of submucosal involvement.
The HRF imaging component described here was designed for use with the contrast agent proflavine because of its ability to rapidly stain cell nuclei and, therefore, allow assessment of nuclear morphology, which is a well-established marker of dysplasia and neoplasia. Proflavine has also been previously used in imaging epithelial tissues in the oral cavity and GI tract. [12][13][14][15]41 The use of 1325-nm OCT and a short-pass dichroic with a 950nm cut-off allows the system to be easily modified for use with different dyes within the visible and the NIR spectrum for HRF imaging. By changing only the excitation source, and the excitation and the emission filters, the HRF system has the potential to image with other fluorescent contrast agents, including fluorescein, indocyanine green, or 5-aminolevulinic acid. 11,12,41 These dyes, which stain different tissue features, could potentially be used to assess the disease state based on indicators other than nuclear morphology. 41 In summary, we describe a multimodal optical imaging instrument capable of simultaneously capturing en face images  of subcellular level detail within the epithelium, alongside crosssectional images of submucosal tissue microstructure. This work reports the design principles, evaluates the imaging performance of the instrument, and demonstrates that real-time, high resolution imaging can be achieved in vivo in a compact design, using only off-the-shelf lenses. Future work will evaluate the ability of the instrument to distinguish dysplastic from healthy oral mucosa in the clinical setting.