Photometric stereo endoscopy

Abstract. While color video endoscopy has enabled wide-field examination of the gastrointestinal tract, it often misses or incorrectly classifies lesions. Many of these missed lesions exhibit characteristic three-dimensional surface topographies. An endoscopic system that adds topographical measurements to conventional color imagery could therefore increase lesion detection and improve classification accuracy. We introduce photometric stereo endoscopy (PSE), a technique which allows high spatial frequency components of surface topography to be acquired simultaneously with conventional two-dimensional color imagery. We implement this technique in an endoscopic form factor and demonstrate that it can acquire the topography of small features with complex geometries and heterogeneous optical properties. PSE imaging of ex vivo human gastrointestinal tissue shows that surface topography measurements enable differentiation of abnormal shapes from surrounding normal tissue. Together, these results confirm that the topographical measurements can be obtained with relatively simple hardware in an endoscopic form factor, and suggest the potential of PSE to improve lesion detection and classification in gastrointestinal imaging.


Introduction
Many clinically important lesions of the gastrointestinal tract have unique surface topographies. For some lesions, this topographical contrast is more pronounced than their coloration. However, tissue coloration is the primary source of contrast in conventional endoscopic cameras. In fact, endoscope camera lighting strategies minimize rather than accentuate topographical contrast. For lesions with pronounced topographical features, it may be possible to improve detectability and classification accuracy by modifying the imaging hardware to enable acquisition of both conventional color-based images and surface topographical data.
Colon cancer screening by colonoscopy is an obvious candidate for topographical imaging. Precancerous lesions of the colon (polyps) can be difficult to detect using conventional color imagery. These lesions, however, can protrude (or be recessed) significantly relative to the surrounding mucosa. 1,2 Conventional colonoscopy relying primarily on coloration is estimated to miss one out of every four lesions. 3,4 A method to provide endoscopic measurement of surface topography paired with conventional color imagery could reduce this high missed lesion rate. Specific evidence for the value of topographical contrast in colon cancer screening comes from both computed tomography colonoscopy (CTC) and chromoendoscopy studies. In CTC, which relies entirely on topographical contrast, multiple studies have reported high lesion detection sensitivity. 5,6 In chromoendoscopy, surface dyes that highlight topography are used, and lesion detection rates have been documented to increase. 7,8 Despite their recognized advantages, CTC and chromoendoscopy are infrequently used due to the increased cost associated with follow-up colonoscopy for CTC and the significant time required for chromoendoscopy.
Here, we introduce photometric stereo endoscopy (PSE), an optical approach to topographical imaging that can be implemented in a straightforward manner with conventional endoscopic cameras. PSE is based on photometric stereo (PS) concepts established in computer vision. In PS, surface normals of an imaged object are calculated by comparing a set of images acquired from the same camera but with differing illumination conditions. 9,10 If one assumes a Lambertian remission of the light, i.e., isotropic scattering, the surface normal of each pixel can be calculated by solving a system of linear equations relating the intensities of that pixel from each source. By integrating the associated gradients, the three-dimensional (3-D) topography of the field of view (FOV) can be reconstructed. Most endoscopes already include multiple light sources, but these light sources are used simultaneously to reduce shadowing and increase ambient luminosity and color contrast. In principle, with modifications only to light source cycling and camera frame synchronization, the traditional endoscope could be modified to acquire images with unique illuminations conditions, and PS algorithms could be applied to these images to extract surface topography.
Unfortunately, conventional PS algorithms assume a narrowangle FOV wherein the directional vectors from each object pixel to each light source are known (a vector field which changes with every movement of the object relative to the sources). In PS, this is achieved by placing the light sources far away from the sample so that the directional vectors are approximately constant, a solution that is incompatible with endoscopic imaging. Recent work toward 3-D monocular laparoscopy 11 has addressed this problem by performing an off-line calibration of the illumination system, but still requires a calibration target to be placed on the object at the time of imaging. Here, we demonstrate PSE, an endoscopic implementation of PS that employs additional algorithms to mitigate errors induced by using more compactly positioned light sources, and does not require a calibration target in the FOV. Specifically, in PSE, low spatial frequency components of measured gradients are rejected, and surface orientation maps are derived based on higher spatial frequency topographical measurements that are less sensitive to light placement.
In a series of experiments evaluating PSE, we demonstrate a qualitative agreement between the acquired high-frequency morphology and the shape of the imaged object. For these purposes, we have used a controlled benchtop prototype, with which we show how the morphology reconstruction performs with complex geometries present in an anatomical colon phantom. In another test with the same prototype, we assess the robustness of the technique when the object has heterogeneous optical properties, imaging a variety of ex vivo human gastrointestinal lesions. Finally, to confirm the scalability of PSE to an endoscopic size system, we demonstrate reconstructing the topography of a simple phantom using a modified commercial endoscope (with some implementation limitations) and compare it with the benchtop prototype imaging of the same phantom.

Methods
PSE calculates the surface normal for each pixel in an image from a set of images acquired with different lighting conditions. In PSE, the direction normal to the surface is represented byn, the vectorŝ i represents the direction to light source i, and the image intensity is proportional to cos θ ¼n ·ŝ i [ Fig. 1(a)]. We implemented PSE in two systems: a benchtop prototype and a modified commercial endoscope. The benchtop prototype [ Fig. 1(b)] allowed more flexible control over illumination and image capture conditions. In this system, we were able to access raw image data from our sensor, synchronize source illumination with the frame rate, and introduce cross-polarizers to reduce specular reflections. However, the minimum source separation we could achieve with this system was 35 mm. By comparison, the distal tip of typical commercial colonoscopes ranges in diameter from 11 to 14 mm. To address this limitation, we implemented a second PSE setup using a commercial gastroscope. External light sources were attached around the scope housing [ Fig. 1(c)]. In this system, we were able to reduce source separation to 12 mm. However, because the interface between the commercial camera sensor and the processing console was inaccessible, images were obtained after secondary processing by the console. In addition, the miniaturized size of the endoscope tip, combined with its wide-angle FOV and flat front design made it challenging to add polarizer layers. For this reason, we did not include cross-polarizers to reduce specular reflections. With the combination of these two setups, we were able to first analyze PSE deployments using optimized imaging hardware and transparent data processing but unrealistic size scales (benchtop prototype), and second confirm that these results are maintained qualitatively as the device scales to a more realistic size (modified endoscope).

Benchtop Prototype
Our benchtop PSE system was constructed with four light sources mounted around a camera with a fish-eye lens [ Fig. 1(b)]. The size of the housing was 30 × 30 mm 2 , and the four sources were oriented at equal angles about a circle with a 35 mm diameter. We used a dragonfly®2 remote head camera with a 1∕3 in color, 12 bit, 1032 × 776 pixel CCD (Point Grey Research, Inc., Richmond, BC, Canada). The image was created with a 145 deg FOV board lens (PT-02120, M12 Lenses). The FOV of this system was partially occluded by the imaging head. We used white LEDs for illumination (Mightex FCS-0000-000), coupled to 1 mm diameter, 0.48 NA multimode fibers. Sources were synchronized to the camera frame rate of 15 Hz. A holographic light shaping diffuser was placed at the end of each source to efficiently spread illumination light (80 deg Circular Diffuser, Luminit). Linear polarizers were placed in front of the sources and objective lens in a cross-configuration to minimize specular reflection. Images in raw data format were processed with a de-mosaicing interpolation algorithm to provide full resolution RGB images from Bayer-patterned raw images. The pixel intensities were then estimated by a weighted average of the three color channels.

Modified Commercial Endoscope
To generate images with smaller light, camera, and object separations, we modified a commercial endoscope to use added light sources [ Fig. 1(c)]. We used a Pentax EG-2990K gastroscope with a Pentax EPK-1000 video processor. For illumination, we used fibers with an integrated light diffuser (NA ¼ 0.5, D ¼ 1 mm, Doric Lenses Inc.), and no polarization filters. The four fibers were secured at equal angles in a 12-mm diameter circle around the endoscope tip, resulting in an external diameter of 14 mm. Components were held together with tape wrapping. We acquired uncompressed video in NTSC format at 8 bit, 720 × 486 pixel resolution, and 29.97 interlaced frames per second using a video capturing device (Blackmagic Intensity Shuttle). Light sources were alternated at 60 Hz, synchronized with the video signal to deinterlace a sequence of RGB frames captured with only one light source active at a time. Frames were then interpolated in every other horizontal line to obtain full resolution images. The image intensity was estimated as the weighted average of the three color channels.

PSE Algorithm
Our algorithm relies on the approximation that the light remitted from our sample surface follows the Lambertian reflectance model. For each pixel, the apparent brightness or luminous intensity (I) is determined by the surface irradiance (I 0 ), the cosine of the angle between the unit vector normal to the surface (n) and the unit vector indicating the direction to the illumination source (ŝ), and a constant amplification factor (A) that includes the reflection coefficient or albedo: Neglecting cast shadows and specular reflections, with the light source i coming from the directionŝ i , the measured intensity m i at one pixel can then be represented as where n ⇀ ¼ ½n x ; n y ; n z T is a nonunitary vector with magnitude A · I 0 and directionn. An example sequence of images under different illumination is shown in Fig. 2(a)-2(d).
Under PS assumptions, a sequence of three measurements of the same sample can be written as 2 4 s x1 s y1 s z1 s x2 s y2 s z2 s x3 s y3 s z3 This is a linear system of equations that can be solved for n ⇀ if the light sources matrix is nonsingular. This condition is equivalent to requiring that the three light sources and the sample do not lie in the same plane. If more than three measurements are acquired, the normal vector can be estimated by minimizing the residual error given the measurements and the source directions: Our PSE approach uses the conventional framework of PS imaging as described above, but includes an additional filtering of lower spatial frequency components of the surface normal data. This is necessitated by close placement of the sources, camera, and imaging object. Specifically, the traditional PS assumption thatŝ i is constant for all pixels in the image becomes especially inaccurate when the FOV is large relative to the distance from the object to the light sources. We empirically found this effect to induce only a low frequency error on the reconstructed normals, on the order of 1 cycle/FOV. This result is expected, given that the source directions change slowly across the FOV. As the motivation for PSE is to obtain useful information about the lesions and texture present in an endoscopic setting, which are often high frequency topographies, we can preserve these features while filtering out the inaccurate information by applying a high-pass filter to our reconstructed normal map.
Assuming a continuous sample that can be described as z ¼ fðu; vÞ with z the distance from the objective to the sample and ðu; vÞ the pixel coordinates, its directional gradients can be obtained by scaling the direction normal to the surface: Because both the spatial frequency filter and the differentiation are linear operations on ðu; vÞ, these operations are interchangeable, and the high-pass filter of −∂f∕∂u is equivalent to the gradient in direction u of the high-passed surface. Thus, by high-pass filtering each of the directional gradients, we can obtain the gradients of the high frequencies of the shape. For each directional gradient, we applied a high-pass filter by subtracting the low frequency component of the signal, which was calculated as a convolution of the original gradient with a Gaussian kernel with σ ¼ 40 pixels in image space. This kernel's full width at half maximum value was approximately 8 cycles/FOV. Applying this filter to the normal map in Fig. 2(e) results in the normal map as shown in Fig. 2(f). In this work, we represent normal maps using a color coded RGB image, as is commonly used in computer vision, 12 where the surface normal vector is normalized and its x, y, and z components are mapped to values of red, green, and blue, respectively.
To calculate height maps, the filtered gradients were integrated using a multigrid solver 13 for the Poisson equation 14 that minimizes integration inconsistency errors [ Fig. 2(g)]. To visualize both the color information and the acquired topography, we can overlay the color image on the calculated height map [ Fig. 2(h)].

Anatomical phantom
We tested the performance of PSE imaging in a tubular environment using a silicon colon phantom (Colonoscopy Trainer, The Chamberlain Group). This phantom has been used in a previous study for investigating lesion detection rates in colonoscopy. 15 The overall shape of the colon including curvature and haustra is represented in the phantom. Fabrication artifacts provide features comparable in size to subtle colon lesions. The material has a homogeneous color, and the surface is smooth and shiny. This model emulates the geometry of the colonoscopy environment to evaluate effects such as the tubular shape, wide FOV, cast shadows, varying working distance, and nonuniform illumination.

Ex vivo human tissue
Specimens from gastrointestinal tissue resections (for any indication) were identified in the Department of Pathology, Massachusetts General Hospital. Specimens with abnormalities were selected for imaging. All tissue samples were imaged within 24 h of resection, either fresh or after preservation in formalin for less than 24 h. The protocol was reviewed and approved by the Partners Institutional Review Board.

3-D printed phantom
PSE signal relies on intensity variation due to illumination from different light source positions. Intuitively, if the sources are moved closer together, there will be less intensity variation between images taken with different sources, and the signalto-noise ratio (SNR) of the surface normal estimation will decrease. To evaluate the performance of PSE with a light source separation and working distance realistic for endoscopic purposes, we tested the ability of both the benchtop prototype and the modified endoscope to capture the morphology of a diminutive bump. We used a 3-D printed phantom with a 1-mm tall, 0.5-mm radius bump that was created using stereolithography (Quickparts.com). This phantom represents a challenging lesion topology-flat lesions, which exhibit a particularly subtle height change compared to the surrounding tissue, have typical elevations of 2.2 mm. 16 The sample was painted with pink tempera paint to reduce specular reflection. The working distance was set to 35 mm in the modified endoscope and adjusted to 87 mm in direct proportion to the light source separation for the benchtop prototype.

PSE Imaging of Anatomic Shapes: Phantom Imaged by Benchtop Prototype
We performed PSE imaging on several regions of the silicon anatomical phantom with the benchtop prototype. The expected surface normals were recovered across the FOV, as shown in the frontal view of the cecal wall presented in Fig. 3(a). The flat regions of the cecum generated regions with normal components pointing primarily in the z-direction, and bumps and ridges yielded normals such that they were correctly reconstructed after integration. We emphasize that the topographical data presented in the surface normal map and the 3-D rendering are complementary to the color information in the conventional image-this topography cannot be reconstructed from the conventional image alone. Three diminutive bumps that range in height from 0.5 to 1 mm were correctly identified as elevations in our reconstruction [white arrows, Fig. 3(a)], though it is difficult to appreciate based on the conventional color image alone.
The illumination intensity that reached the sample from the light sources was strongly affected by the working distance, which varied significantly within the FOV. For example, when imaged down a tubular shape, the pixels in the center of the image received much less light than those at the periphery. However, accurate normal reconstruction in PSE relies on intensity differences for each pixel in a series of images, and lighting changes that are consistent across the PSE image series should primarily affect the SNR of the measurements rather than introduce topographical bias. This was demonstrated in a PSE image of the transverse colon presented in Fig. 3(b). Though the light intensity reaching the surface down the tube was much lower than that illuminating the adjacent wall, the high frequency details in surface orientations of the object were still acquired.
There are several sources of topographical error in PSE that become amplified when imaging in a tubular geometry. The assumption that the source vectors are constant across the FOV becomes worse as the distance between each point in the object and the light source changes. Furthermore, any portion of the object that casts a shadow creates a nonlinear artifact; the region that is cast in shadow is reconstructed to have a surface normal that points more perpendicularly than it should to the direction of the light source that shadows the region. This artifact exaggerates slopes facing away from the lights. Qualitatively, this effect emphasizes ridges and sharp features, which may actually be helpful for the purpose of increasing lesion contrast. This effect was observed in the shadows cast by the muscular features and haustra of the colon phantom, shown in Fig. 3(b).

PSE Imaging of Tissue Surfaces: Ex Vivo Human Tissue Imaged by Benchtop Prototype
We performed PSE on ex vivo human gastrointestinal tissue in order to evaluate performance on samples with heterogeneous optical properties, reflective surfaces, and clinically relevant lesions. Figure 4(a) presents the topography obtained from a right colectomy with a tattoo applied next to a benign ulcer that resulted from a polypectomy. Here, our normal map correlated to the visible folds in the conventional image. The ulcer, identified by a gastroenterologist at the time of the imaging, was reconstructed as a prominent indentation in the tissue, indicated with a dashed arrow in Fig. 4(a). However, the tattoo, which left a concentrated point of indigo color at the site of the injection, did not register as a topographical change (solid arrow). This demonstrated that the PSE was able to separate a pixel's surface normal vector from its albedo. Next, we imaged a benign precancerous sessile serrated polyp that was identified after a right colectomy [ Fig. 4(b)]. In this experiment, the light source in the bottom right of the FOV did not diffuse as well as the other three sources. As a result, the image with this light source saturated the bottom right of the FOV and the topographies were poorly reconstructed in that region. Nonetheless, the sessile lesion encircled with a dashed oval in Fig. 4(b) clearly influenced the normal map. In the  surface rendering that was generated from the normal map, the lesion had a plateau-like topography that is characteristic of a sessile lesion, and that was observed during this experiment.
Finally, we imaged a metastatic melanoma that was present in fresh ex vivo human small bowel tissue [ Fig. 4(c)]. This feature was also identifiable in the normal map and reconstructed height profile. Note again that here PSE was able to distinguish between color changes of the tissue and actual folds that were present in the sample.
Because the ex vivo human tissue was wet, specular reflection was more prominent than what was observed in the silicon phantom. This led to artifacts in our surface normal reconstructions. Specifically, pixels that had specular reflections were reconstructed to have surface normals that pointed more toward the source that generated the specular reflection than they actually should.

Translation of PSE Imaging to Endoscopic Size: 3-D Printed Phantom Imaged by Modified Endoscope
These previous results show the ability of PSE to obtain high frequency topography in geometrical settings where traditional PS assumptions do not hold, and demonstrate that PSE topography records changes in surface orientation of tissues with heterogeneous optical properties. However, these results have been obtained using a light source separation distance larger than what fits in an endoscope.
To qualitatively confirm the scalability of PSE to endoscopic size, we compared results of imaging a simple object with the benchtop prototype and the modified endoscope. Because our modified endoscope setup was sensitive to specular reflections and had an inaccessible video digitizing system, we used a nonspecular 3-D printed phantom with a single bump and compared with PSE reconstructions from our two prototypes. PSE consistently estimated the morphology of bumps and depressions with both systems. In all estimations, the surface normal directions correctly showed the elevation or depression as a region in which border surfaces were oriented outwards for elevations and inwards for depressions.
When using a small source separation of 12 mm and a large working distance of 35 mm with the modified endoscope, PSE was still able to recover surface orientations of a small bump. Figure 5 shows results comparing the performance of PSE in the benchtop prototype at an 87 mm working distance  Fig. 5(c), the topography imaged at 87 mm distance resulted in an increased level of noise, possibly due to the low illumination intensity at that distance and a lower number of pixels covered by the 5 × 5 mm 2 area.
Noticeable artifacts present in these estimations included measurement noise, gradients amplitude scaling, discretization of the shape, and albedo variations. The albedo nonuniformities were caused by an uneven layer of paint and differences in surface orientation. The amplitude of the gradients decreased with the working distance. Discretization of object was noticeable and is attributable to the small portion of the FOV that the sample covered. For example, a 1-mm wide feature imaged at a 35 mm working distance covered approximately 9 pixels across the images acquired with the modified endoscope.

Conclusions
Here, we introduced PSE as a new endoscopic technique to simultaneously acquire both a color image and the associated high frequency topography of a FOV. We demonstrated the utility of PSE in samples with complex geometries, including tubular environments. PSE is also able to reconstruct normal maps that are correlated to color images in ex vivo human tissues with heterogeneous optical properties. This demonstrates the power of the technique to separate a pixel's surface normal vector from its albedo. We observed that very fine folds [such as those