Deep-learning-based image super-resolution of an end-expandable optical fiber probe for application in esophageal cancer diagnostics

Abstract. Significance Endoscopic screening for esophageal cancer (EC) may enable early cancer diagnosis and treatment. While optical microendoscopic technology has shown promise in improving specificity, the limited field of view (<1  mm) significantly reduces the ability to survey large areas efficiently in EC screening. Aim To improve the efficiency of endoscopic screening, we propose a novel concept of end-expandable endoscopic optical fiber probe for larger field of visualization and for the first time evaluate a deep-learning-based image super-resolution (DL-SR) method to overcome the issue of limited sampling capability. Approach To demonstrate feasibility of the end-expandable optical fiber probe, DL-SR was applied on simulated low-resolution microendoscopic images to generate super-resolved (SR) ones. Varying the degradation model of image data acquisition, we identified the optimal parameters for optical fiber probe prototyping. The proposed screening method was validated with a human pathology reading study. Results For various degradation parameters considered, the DL-SR method demonstrated different levels of improvement of traditional measures of image quality. The endoscopists’ interpretations of the SR images were comparable to those performed on the high-resolution ones. Conclusions This work suggests avenues for development of DL-SR-enabled sparse image reconstruction to improve high-yield EC screening and similar clinical applications.


Introduction
With an incidence of about 600,000 new cases and more than 500,000 deaths annually worldwide, esophageal cancer (EC) is one of the deadliest cancers worldwide.The two main histological subtypes of EC include: esophageal adenocarcinoma (EAC) 1 and esophageal squamous cell carcinoma (ESCC) 2 .Screening and early detection of ESCC are critically important to help reduce the incidence and mortality associated with EC.However, challenges posed by the extensive surface area of the esophagus, the limited field-of-view (FOV) offered by conventional microendoscopic screening probes (100-1000 µm) [3][4][5] , and the time constraints of endoscopic procedures (6-8 minutes per patient) 1 , necessitate the development of an innovative approach to enable imaging of larger fields and improve diagnostic yield.
In recent years, in-vivo optical microscopy techniques such as confocal laser endomicroscopy (CLE) and high-resolution microendoscopy (HRME), have been applied to visualize the nuclear morphology of the esophageal epithelium and assist in differentiating neoplasia from benign tissue 3,[5][6][7] .In contrast to conventional tissue biopsies and histopathologic analysis, these optical techniques are non-invasive and able to provide real-time results.However, microendoscopy is currently limited by its small FOV (the average diameter of a probe is 0.3 mm) that hinders it from imaging larger tissue areas.
To circumvent this limitation, we propose a novel approach using an end-expandable optical probe to increase the FOV using sparse-data imaging 8,9 .By use of a sleeve mechanism that expands the end of the fiber bundle with unfused microfibers, a larger tissue FOV can be achieved for real-time diagnosis (Figure 1).However, one major challenge for successful application of such an optical probe is the development of image processing procedures that act on the acquired sparse image data and allow formation of a microendoscopic image with enhanced quality.Motivated by recent advances of image super-resolution using deep learning approaches 8,[10][11][12] , here we proposed a deep learning-based image super-resolution (DL-SR) method that estimates a HRME image from its acquired low-resolution (LR) counterpart.In this way, sparse images can be acquired by the end-expandable optical fiber probe to increase the FOV at cost of spatial resolution, which could be subsequently restored by use of a DL-SR method to enhance image quality (IQ) for diagnostic purposes.The goal of our study is to demonstrate the feasibility of the end-expandable optical fiber probe and the proposed sparse imaging methodology to achieve a wider FOV with image quality comparable to that achieved by a conventional probe of fused fiber.As a surrogate of clinical trials, computer simulation studies, also known as virtual imaging trials, provide us with an economical and convenient route to explore imaging system designs 13 .Here, estimates of LR microendoscopic images of esophageal mucosa that would be acquired by use of the novel optical fiber probe are computationally simulated.This will be accomplished by degrading HRME images by use of a degradation model that incorporates the physical factors of the optical fiber probe.Subsequently a DL-SR model was employed to generate super-resolved (SR) images from the LR ones.The impact of various degradation parameters of the probe on DL-SR performance was investigated to identify optimal parameters for prototyping.Additionally, a clinically-relevant detection task of esophageal neoplasia was conducted by endoscopists to study the impact of DL-SR on task performance.The results will provide valuable guidance for future prototyping and the advancement of this novel imaging technique.

Methods
To demonstrate the applicability of the proposed end-expandable optical fiber probe model, DL-SR models were trained in an end-to-end manner to learn a mapping between the LR images simulated with degradation models that incorporate different optical probe parameters and the HRME images.Traditional IQ metrics were computed on simulated LR and SR images to assess the perceptual image quality and identify the optimal parameters for the probe prototyping.In addition, a human pathology reading study was carried out to evaluate the utility of the images enhanced by the DL-SR model within a screening context.

HRME image acquisition and pathology interpretation
A HRME image dataset acquired by use of a low-cost, point-of-care HRME device that imaged the esophageal epithelium of patients undergoing endoscopy for ESCC screening 14  The details of image acquisition have been described in previous studies 14,15 .The HRME imaging system consisted of a compact epi-fluorescence fiber optic microscope which provides 1000x magnification views of epithelial tissue and subcellular features to distinguish cancerous from benign tissue after staining with a topical fluorescent dye, 0.01% proflavine hemisulfate.It has been evaluated in various anatomical sites, such as the cervix, anus, mouth, throat, and esophagus 5,15 .Due to its low cost and reusability, HRME can have a high impact in resource-limited global settings.The HRME principles and schematics have been described previously 14 .Typically, the HRME device conducts imaging through a fused optical fiber bundle with a cross-sectional diameter of 800 μm and 4.4 μm lateral spatial resolution (an individual fiber strand diameter as little as 4 um).For a conventional microscopic probe, the input and the output at the distal and proximal ends of the fibers demonstrate the same pattern.The fiber strands are positioned as close to each other as possible, and the size of the fiber strands usually defined image resolution for a registering camera of high-resolution.The light illumination measures are collected from a continuous part of tissue and registered on a camera correspondingly.All HRME images were standardized in size of 960 pixels  1280 pixels.A Gaussian filter with standard deviation of 2 pixels was applied to remove the comb pattern introduced by the sparse data of optical fiber bundles.A contrast-limited adaptive histogram equalization (CLAHE) with a clip limit of 0.005 was employed to enhance the image contrast.

Quality control and image selection
In addition to image quality control conducted as described in the previous study 15

Simulated image data for end-expandable optical fiber probe
A virtual imaging trial was performed to simulate LR images that would be acquired using the proposed novel optical fiber probe by use of the acquired HRME images.As shown in Figure 2b, one can find a schematic of the concept to an end-expandable optical fiber probe.Unlike conventional image acquisition through an optical fiber bundle (Figure 2a), the end-expandable optical fiber bundle can form a "brush" at the input end (Figure 2b).The input elements of such lightguide are fiber strands which are placed at a distance from each other and all together they represent a sparse data set.Thus, the optical fiber "brush" collects fluorescent emission from a surface partially and discontinuously.To create LR images, we processed the simulated sparse image data acquired from the end-expandable endoscopic optical fiber probe (Figure 2b) by filling in non-populated pixels of the sparse images (Figure 3b).Diagrams that depict the conventional and sparse image data acquisition and processing are shown in Figure 3.To emulate the LR images that would be acquired from the novel end-expandable optical fiber probe, a degradation model that incorporates the fiber knock-out downsampling was applied to the HRME images.Assuming a single fiber strand with a diameter of 4 μm, we considered a pixel size of 2 μm in this study.As shown in Figure 3, given a fiber strand represented by a rangeof-interest (ROI) block of  pixels ×  pixels that resides in the center of a FOV block of s pixels  s pixels, a random deformation offset with maximum of   pixels and   pixels in horizontal and vertical direction was considered, respectively.This was designed to depict the off-center deviation when defining the ROI block.The degradation model calculated the mean value over pixels within the ROI block and filled the pixels in FOV block with the derived value.In this study, three different parameters of the degradation model were investigated to evaluate the DL-SR 8 performance: the fiber diameter , the inter-fiber distance s, and the deformation offset .The schematic of the proposed degradation model is shown in Figure 4.

Deep learning-based image super-resolution
Given an LR image  LR virtually acquired with the end-expandable optical probe, image superresolution methods seek to produce a super-resolved image  SR as an estimate of a HRME image that would be obtained from a conventional fused endoscopic fiber probe.However, this is a challenging ill-posed inverse problem.In recent years, deep learning-based super-resolution methods have been widely applied in various applications [10][11][12][16][17][18][19][20] . Here, he well-studied superresolution convolutional neural network (SRCNN) 10 was employed to investigate the feasibility of DL-SR methods to improve quality of simulated LR images from the novel optical probe.Such analysis can be readily repeated with other more recent DL-SR approaches 11,21 .The SRCNN seeks to establish a mapping from the space of simulated LR images to the space of HR images: where ℱ is the network parameterized by .The SRCNN was trained by minimizing the mean squared error (MSE) between generated SR images and original HRME images.The architecture of a SRCNN is shown in Figure 6, consisting of three feedforward convolutional layers interspersed with leaky rectified linear unit (LReLU) nonlinearities.The filter sizes of the three convolutional layers were 99, 11, and 55 and the corresponding number of filters were 64, 32, and 1, respectively.The training and validation data for the SRCNN consisted of 206 and 50 paired HRME and corresponding simulated LR images, respectively.During training and validation, each image was randomly cropped into 10 patches with a size of 512  512.In the testing stage, the full-size simulated LR images were used.The SRCNN was trained with Adam optimizer 22 with a learning rate of 0.0001.The network was trained for 300 epochs with a batch size of 8, and the model with best validation performance was used for downstream task evaluation.For various degradation parameters mentioned in Section 2.3, the SRCNN was retrained and evaluated.SRCNN were implemented with TensorFlow 2.0 and trained on NVIDIA GPUs.

Image quality assessment and statistical analysis
To assess the DL-SR performance with consideration of various degradation parameters used to simulate the LR images, traditional IQ metrics such as peak-signal-to-noise ratio (PSNR) and structural similarity index metric (SSIM) were computed on the simulated LR and SR images.In addition, a binary detection task to determine whether esophageal neoplasia is present or not was performed by endoscopists on both SR images obtained from SRCNN and the original HRME ones.The accuracy and confidence level were evaluated to assess the task-based performance of the employed DL-SR method.
A total of 4 endoscopists (3 experts, 1 novice) underwent standardized training in HRME image interpretation.Expert endoscopists were defined as having previously performed >50 HRME cases, whereas novices were new to the technology.All endoscopists viewed a set of training slides that demonstrated the features of neoplastic and non-neoplastic classification of HRME images, including nuclear size, crowding, and pleomorphism.All endoscopists were asked to interpret a series of original HRME images and generated SR images as neoplastic (high-grade dysplasia, ESCC) or non-neoplastic (normal squamous, esophagitis, low-grade dysplasia) along with their confidence level in their interpretation (high or low).
Assuming 80% power,  = 0.05, and 2-sided test to establish equivalence between HR and SR images with equivalence limit of 0.15, a sample size of 120 images was calculated using sample-based variance estimates without continuity correction for binary data.The sensitivity, specificity, and accuracy of HRME and SR image interpretation of individual endoscopists was compared with histopathology as the gold standard.Unpaired t-test was computed to assess for differences in the sensitivity, specificity, and accuracy in endoscopists' interpretation of HR and generated SR images.

Performance evaluation of DL-SR to super-resolve simulated LR images of end-expandable optical fiber probe
The DL-SR model was first trained and tested on HRME images and corresponding LR images simulated by use of a degradation model with fiber diameter m=4 μm, inter-fiber distance s=8 μm and no offset imposed.Figure 7 shows two examples of the test images where perceptual image quality of the simulated LR images appeared to be improved by DL-SR. Figure 8 shows a zoomedin, cutoff image patch of the original HR, simulated LR, and generated SR images and the corresponding cross-sectional profiles of optical intensity.Note that, as expected, the line profiles of the SR image aligned closer to the HR one, compared to the line profiles of the simulated LR image.Visual inspection confirms that details of subcellular features could be restored by use of SRCNN to obtain improved perceptual image quality.

Impact of offset, inter-fiber distances and fiber diameter on traditional image quality metrics
In this study, the impact of different parameters considered in the degradation model on traditional IQ metrics was evaluated.The offset d was varied from 0 μm to 10 μm, and the inter-fiber distance s was changed ranging between 4 μm to 24 μm.A fiber strand diameter m ranging from 4 μm to 12 μm was applied.All SRCNN models were trained on paired of HRME and LR images simulated with various degradation parameter values.Traditional measures of image quality were assessed by computing PSNR and SSIM values on a test set consisting of 300 images, and these quantities are plotted on Figure 9.In most cases, the SR images generated by the SRCNN demonstrated improvements across various offsets, inter-fiber distances and fiber diameters compared with their LR counterparts in terms of the traditional IQ metrics.Moreover, various degradation parameters showed different levels of impacts on the traditional IQ metrics of simulated LR and SR images.
When increasing the offset of fibers, traditional IQ metrics of both LR and SR images decreased.Similar phenomena were observed when the inter-fiber distance and fiber diameters were increased.This is due to the more severe degradation model incorporated and thus DL-SR performs worse on images largely missing information when the network capacity or the number of training images were limited.

Discussion
In this study, an endoscopic end-expandable optical fiber probe was proposed and evaluated for its ability to improve the field of imaging in esophageal cancer screening.The novel concept of an end-expandable optical fiber probe was simulated by use of degradation model incorporating various probe parameters and was tested using HRME images of esophageal squamous tissue.We performed a virtual imaging trial to simulate LR microendoscopic image dataset from clinical HRME images and demonstrated the effectiveness of a deep learning-based super-resolution algorithm to improve the perceptual image quality measures of LR images.Furthermore, the generated SR images were comparable to the original HRME images when interpreted by endoscopists in a diagnostics task.
The capability of the employed DL-SR algorithm to super-resolve a LR image of endexpandable optical fiber probe was proven for the first time here and should be highlighted because of its potential for further improvement and application.As expected, the quality of the SR images deteriorated when severe degradation models were considered.This indicated the significance of choosing tolerable parameters for future physical prototypes of the end-expandable optical fiber probe.We found that the diameter of a single fiber strand should be no more than 6 µm.The interfiber distance can be increased to up to 14-16 µm which should be taken into account in the design and development of an optical fiber probe sleeve and a mechanism for controlling the inter-fiber distance.The fiber flexibility can permit offset of up to 4-6 µm.As a result, it will allow for increasing the diameter of an endoscopic sensor up to 4-5 mm instantly (the FOV is 10-20 folds bigger than HRME) that can provide a dramatic improvement in esophageal screening efficiency.
Endoscopists had comparable diagnostic accuracy when interpreting original HRME images and DL-SR generated SR ones.While specificity could be improved, sensitivity was higher for SR images than for the original HR images.Our metrics of success focused on achieving comparable performance of human reads on SR images compared to that on HR images, which was achieved.Moreover, endoscopists had comparable level of confidence when interpreting SR images compared to the original HR images, which is a quantitative task-based assessment of the SR image quality.DL-SR methods that effectively estimate SR images, resembling the original HR images, is a robust strategy to increase the field of visualization during esophageal cancer screening while maintaining accurate esophageal neoplasia detection by clinicians.More advanced deep learning-based image super-resolution methods that employ generative adversarial networks 11,19 and diffusion models 21 should also be investigated for such application in future studies.
This study evaluated the feasibility, imaging capability, and clinical performance of a hypothetical end-expandable optical fiber probe based on simulation studies using existing HRME images.Future in vivo studies will need to develop and test the fiber probe and its effectiveness in esophageal cancer imaging.Using a larger dataset of the clinical microendoscopic images will potentially further advance the DL-SR performance.Another limitation of this study includes the rectangular orientation of the degradation model.A radial model of the optical fiber probe expansion will be applied to future studies.

Conclusion
We proposed an end-expandable optical fiber probe that would increase the field of visualization by up to 20-fold compared to a traditional fused HRME probes.We further validated the DL-SR generated images produced from the end-expandable optical fiber probe and found endoscopists' interpretation of the SR images to be comparable to that of conventional HRME images.The proposed novel end-expandable optical fiber probe has the potential to enable highyield endoscopic microscopy screening and even facilitate a screen-and-treat protocol for early esophageal cancer treatment.Moreover, the proposed sparse image methodology will provide valuable guidance for future prototyping and the advancement of optical biopsy techniques over larger surface areas.

Figure 1 .
Figure 1.Conceptual 3D rendering of a settled optical fiber probe (a) and an end-expanded optical fiber probe (b), respectively.Here, the diameter of a single fiber strand can vary from 4 to 6 µm.
was used in the study.The HRME images were obtained sequentially from patients enrolled in a clinical trial comparing standard of care Lugol's chromoendoscopy (LCE) to LCE+HRME at 3 sites: First Hospital of Jilin University (Changchun, China), the Cancer Institute at The Chinese Academy of Medical Sciences (Beijing, China), and Baylor College of Medicine (Houston, Texas, USA) from December 2014 to November 2016, approved by the Institutional Review Board at Baylor College of Medicine [IRB# H-34973].

Figure
Figure2ademonstrates the optical fiber pattern generated by the fiber strands of the fused optical

Figure 2 .
Figure 2. Schematics of image acquisition by means of a) a conventional fused endoscopic fiber probe; (b) an endexpandable, unfused endoscopic fiber probe.
, bioengineers further categorize the acquired HRME images into three types of perceptual image qualities: good, intermediate, and poor.Images with good quality should demonstrate a FOV where nuclei can be clearly visualized.Images with intermediate quality showed mild motion blur or defocus aberration that could affects approximately a quarter of the imaged area.Images with poor quality were classified by factors such as severe motion blur, obstructed vision or image corruption for half of the image area or more.Only images of good and intermediate perceptual quality were selected and used in this study.

Figure 3 .
Figure 3. (a) 1D-schematic of the conventional image data acquisition leading to the HR images; (b) the sparse image data acquisition and reconstitution from the compressed data set into the LR images.Red elements represent a fiber strand collecting light illuminated by the tissue surface and green elements are a set of pixels displayed as average value of light intensity in the adjacent fiber.(c) 2D-illustration of the simulated sparse image data; s and m are the length of side of FOV and ROI, respectively;   and   are the offsets in x and y directions, respectively.

Figure 4 .
Figure 4. Illustration of degradation models that incorporated various optical fiber probe parameters including (a) offset, (b) inter-fiber distance and (c) fiber diameter.

Figure 5
Figure 5 shows examples of an HRME image, an intermediate sparse image and the LR

Figure 5 .
Figure 5. Example of an original HRME image, the same imaging area as sparse data captured by the proposed endexpandable optical fiber probe, and the LR image restored from the sparse data.

Figure. 6
Figure.6 Schematic of the SRCNN architecture used in the study.

Figure 7 .
Figure 7. Examples of original HRME image, simulated LR image and corresponding SR image (top row) generated by the DL-SR method with magnification (bottom row).

Figure 8 .
Figure 8. Cross-sectional line profile on selected images allows comparison of profiles of optical intensity associated with stained nuclei on original HR images, simulated LR images, and reconstructed SR images.

Fig. 9 .
Fig. 9. Traditional IQ metrics show degradation of LR and limitation of improvement of SR images with reference to HR images: PSNR and SSIM values of LR (red) and SR (blue).The gray dashed line denotes a SSIM value of 0.95.

Figure 10 .
Figure 10.Diagnostic performance of endoscopists' reading on HR and SR microendoscopy images shown in box and whisker plot.The cross and horizontal line in the box represents mean and median values of the endoscopists' response.The bottom and the top end from the whiskers are the minimum and maximum values, respectively.

Figure. 11
Figure.11 (a) Sensitivity and specificity plot of individual endoscopist diagnosis on HR and SR microendoscopic images with high and low confidence, respectively (b) Confusion matrix of diagnostic performance of readers on HR and SR images with low and high confidence, respectively.TP, true positive; FP, false positive; FN, false negative; TN, true negative.