Review of endomicroscopic imaging with coherent manipulation of light through an ultrathin probe

Abstract. Endomicroscopy is a technique to visualize microscopic structures of internal tissues through tubular instruments that can be inserted through a small cut or an opening in the body. There has been a growing demand for miniaturizing endoscopic instruments while preserving a high resolution to achieve a real-time histopathologic diagnosis. Meanwhile, there has recently been tremendous progress in the coherent manipulation of light in which an optical wave is deterministically manipulated through a linear, disordered medium for optical focusing and imaging. Here, we review recent research efforts in developing new endomicroscopic imaging schemes based on the coherent formulation and manipulation of optical fibers. In contrast to the conventional schemes using optical fibers as incoherent channels for optical power, these approaches provide a route to fully exploiting useful information transmitted through an ultrathin probe, thereby potentially achieving practical endomicroscopic imaging through a submillimeter thick probe.


Introduction
Endoscopy is an indispensable tool for observing objects that cannot be accessed using conventional free-space imaging techniques. In modern medical practices, endoscopic imaging is routinely performed on the gastrointestinal tract, respiratory tract, urinary tract, and reproductive tract through natural orifices or small incisions to access the body cavity. 1 In the area of biomedical imaging, endoscopic imaging techniques have continuously evolved with the shared goal of achieving high-resolution and high-contrast imaging with a small-sized probe for a minimal footprint. There have been some notable technological advances, including a photographic film-based gastrocamera and a glass fiber-based fiberscope in the 1950s and 1960s and chargecoupled device (CCD)-based video endoscopy in the 1980s (see Fig. 1). [2][3][4] To date, video endoscopy has served as the standard diagnostic technique because of its high information content (i.e. high pixel count), real-time features, and compatibility with useful imaging modalities, such as narrow-band imaging and autofluorescence imaging. 5 Nevertheless, the resolution of conventional video endoscopy 3 is limited to the range of 10 to 100 μm, and the size of the probe is typically larger than 1 mm. Therefore, a conventional endoscope can only be used for gross examination of internal tissues, and thus precise histopathologic diagnosis still requires additional biopsy sampling and careful examination under a conventional microscope.
For the past decades, researchers have explored ways to achieve a lateral resolution comparable to 1 μm to enable visualization of cellular morphology through an endoscopic probe. Such techniques are termed "endomicroscopic" imaging or alternatively "microendoscopic" imaging. [6][7][8][9] Previous approaches can be classified into two categories depending on their focus scanning schemes: proximal scanning techniques and distal scanning techniques. Proximal scanning techniques [i.e., probe-based confocal laser endomicroscopy (pCLE)] use a multicore fiber (MCF), comprising ≳10; 000 cores, in which the cores in effect serve as virtual confocal pinholes for a focused illumination and detection. 8,10 The scanning unit on the proximal side sequentially directs a focused beam to individual cores and a two-dimensional (2D) image is retrieved by stitching the optical signals acquired from each core. In contrast to the proximal scanning scheme, distal scanning techniques (i.e., endoscope-based CLE, eCLE) utilize a single-mode fiber (SMF), and focus scanning is performed on the distal side using a microelectromechanical systems scanning unit (e.g., piezoelectric actuator, electrothermal, electrostatic, and electromagnetic actuators) or a micromotor. 8 Optical coherence tomography (OCT), 11 combined with distal scanning techniques, provides a three-dimensional (3D) volumetric imaging capability with an enhanced imaging depth by virtue of coherence gating. 12 However, the lateral resolution of endoscopic OCT is typically limited to tens of microns. The use of an endocytoscope is another notable approach that integrates high magnification and high numerical aperture (NA) objective lenses on the distal end to acquire a high-resolution image without any scanning. However, the depth sectioning capability of the endocytoscope is substantially worse than that of scanningbased approaches as it indiscriminately collects the scattered light from different tissue depths. 13 Table 1 summarizes the key characteristics of previous representative endomicroscopic imaging systems. 20 The previous endomicroscopic imaging approaches have commonly relied on incoherent image formation with the optical power of back-reflected or backscattered light being directly taken as image information. In this incoherent scheme, an optical fiber or an MCF is only used as a channel to transmit the optical power from one end to the other. Interestingly, optical fibers, as a linear optical material, can be thought of as a channel to transmit coherent information (i.e., both amplitude and phase information), and the high-content information can be fully utilized for endomicroscopic imaging and focusing purposes based on the coherent manipulation of light. The recent development in the coherent manipulation of light stems from an original aim of achieving a long-sought goal of optical imaging through a scattering medium. In 2007, it was shown that coherent light can be focused through a highly scattering medium by iteratively Fig. 1 Overview of endoscopic imaging methods. A timeline and schematics of key technological milestones are presented. eCLE, endoscope-based confocal laser endomicroscopy; pCLE, probe-based confocal laser endomicroscopy; endoscopic OCT, endoscopic optical coherence tomography; CCD, charge-coupled device; and SLM, spatial light modulator.
optimizing an incident wavefront through a feedback loop. 21 In general, any linear optical medium, including a scattering medium and optical fibers, can be formulated as a transmission matrix with elements that describe a linear, coherent relation between input and output optical modes. 22,23 Based on this transmission matrix formalism and associated linear algebra operations, it has been demonstrated that an object behind a scattering medium can be optically retrieved. [24][25][26][27][28] In the present paper, we review recent research efforts in developing coherent light manipulation techniques for endomicroscopic imaging through an ultra-thin probe with a diameter that is <1 mm. These new approaches provide an ideal diffraction-limited resolution for a given optical aperture of a probe without any bulky distal optics, as well as an additional capability to freely set an imaging plane without any moving parts. Figure 1 presents an overview of various endoscopic imaging approaches and the advantages of the coherent approach in comparison with the conventional incoherent techniques-(1) reduced probe size, (2) improved resolution, and (3) flexibility in setting the imaging plane. In Secs. 2 and 3, we review the general properties of optical fibers and present an overview of the coherent formulation and associated linear algebra operations for linear optical media. Then, in Secs. 4 and 5, we review recent developments in exploiting the coherent nature of light through multimode fibers (MMFs) and MCFs, respectively, for endomicroscopic imaging, as well as their applications. In addition, in Sec. 6, we review some recent developments based on unique spatial and temporal properties of optical fibers that can potentially extend the usability and functionality of the techniques introduced in Secs. 4 and 5. Section 7 concludes the review with some remarks on the clinical practicality of coherent control of optical waves for endomicroscopic imaging.

Wave Propagation in Optical Fibers
Here, we review the basic properties of a step-index optical fiber, a flexible cylindrical waveguide consisting of two different dielectric materials.
Step-index optical fibers are composed of an inner "core" part made of a higher refractive index material surrounded by an outer "cladding" part made of a lower refractive index material. In this structure, the light rays with an incidence angle smaller than the acceptance angle, defined as sin −1 ð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi n 2 core − n 2 clad p Þ ¼ sin −1 NA acc , are transported through the core nearly without any loss due to the total internal reflection at the interface of the two dielectric materials. n core and n clad are the refractive indices of the core and cladding materials, respectively, and NA acc is the NA of an optical fiber. Conventional endoscopes typically rely only on this lossless optical energy transport property (i.e., incoherent property) of optical fibers. 29 In the coherent manipulation of light transport through optical fibers, discrete optical modes (i.e., guided modes) play an important role in determining both spatial and temporal properties of the light transmitted through optical fibers. Considering the boundary conditions at the interface of the core and cladding materials, only optical waves with certain longitudinal wavevectors and associated transverse field patterns can propagate through step-index optical fibers without loss. Those features of the guided modes are typically calculated by solving Maxwell's equation with a set of boundary conditions in a cylindrical coordinate. The guided modes may also be thought of as a collection of linearly polarized optical waves that satisfy the two following conditions at the same time: (1) the transverse resonant condition in which the round trip of the optical waves in the transverse direction through internal reflections results in phase delay of integer multiples of 2π (i.e., constructive interference) and (2) the condition for total internal reflection (i.e., transverse wavevector <k 0 NA acc , where k 0 is the vacuum wavenumber).
The number of discrete optical modes (i.e., supported modes) of a given optical fiber, M, is estimated as 29 E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 1 ; 1 1 6 ; 4 9 3 where V ¼ a · ð 2π λ Þ · NA acc is the dimensionless parameter determined by the radius of the core a, the wavelength of light λ, and the NA of optical fiber NA acc . Therefore, when the ratio of the core's radius to the wavelength and the NA acc increase, the number of supported modes increases quadratically. Depending on the number of supported modes, the step-index optical fiber can be classified into two types: 29 SMF, in which V is smaller than or comparable to 2.405, and MMF, in which V is larger than 2.405. In fact, a set of discrete guided modes constitutes an orthogonal basis (i.e., distinctive optical channels) for light transmission through optical fibers. Therefore, the number of modes available inside the waveguide is equal to the information transmission capacity of the fiber, which has a density (= M divided by the size of the entire probe) that determines the efficiency of the imaging system as it is fundamentally related to the spatial bandwidth product of a given imaging system [i.e., the number of resolvable spots in a field of view (FOV)]. Here we introduce three types of optical fibers as endoscope probes and describe how the image information on the distal end is delivered to the proximal end depending on the fiber type.

Single-mode fiber
The SMF transmits most of the optical energy through a single optical mode with a Gaussian-like transverse mode profile with a small core diameter of typically less than 10 μm, a cladding diameter of 80 to 125 μm, and a NA acc of about 0.1 [see Fig. 2(a)]. 30 Therefore, when the light is coupled in or out to the SMFs, the light intensity should be highly confined around the central axis of the fiber with the mode field diameter comparable to the core size. Although SMF can only transmit image information of a single object point at once, this property of SMF can be used to implement a bidirectional pinhole for focused illumination and detection in a confocal imaging configuration. In fact, based on this principle, numerous types of eCLE and endoscopic OCT techniques have been proposed and used for clinical demonstration with a miniaturized beam scanner at the distal tip of the fiber for laser-scanning imaging 8,12 (see Fig. 1). However, the miniaturized beam scanning unit has a relatively slow scanning speed compared with that of external scanning units used for pCLE, such as galvanometer mirror scanners and resonant scanners, and it often makes the size of the entire probe larger than a millimeter, albeit miniaturized.

Multicore fiber
The MCF consists of ≳10; 000 cores embedded in a common cladding material with an overall diameter of hundreds of microns. Each core size and acceptance NA are similar to those of the SMF; hence it is equivalent to an array of SMF with an inter-core distance of a few microns to ensure that the guided light is transported through each core without significant inter-core couplings. The MCF directly transports the intensity pattern from one end to the other with the spatial sampling rate of the inter-core distance; however, the phase information is randomized as slight length differences among cores cause different phase delays on each core [see Fig. 2 Using each core as a bidirectional pinhole, the pCLE technique achieves confocal imaging with a conventional beam scanning unit on the proximal side. In this process, light is successively focused into each core at the proximal end and the optical power emerging from an object point is coupled back into the corresponding core through a lens 10,14 (see Fig. 1). However, the finite inter-core distance for preventing crosstalk (>3 μm) results in inherent pixelation artifacts and reduced lateral resolution [see Fig. 2 30

Multimode fiber
The MMF has a diameter of the core ranging from tens to hundreds of micrometers that supports a large number of guided modes [see Fig. 2(c)]. For example, a typical MMF with a diameter of 105 μm and a NA acc of 0.22 supports approximately 1870 different modes. 31 As it supports a larger number of modes per area compared with an SMF or an MCF, it is fundamentally more efficient in delivering the image information from one end to the other. In comparison, the SMF inevitably requires an additional bulky scanning unit for distal-end scanning, and the MCF has a facet mainly composed of cladding material, resulting in a significantly lower information transmission capacity. ) has a small diameter and supports nearly a single guided mode and thus can only transport an optical field at one point on the distal end with a fixed amplitude and phase. (b) MCF consists of multiple cores, each of which permits nearly a single guided mode. Each core transports the optical fields from the corresponding points with a uniform transmittance, but with a random phase delay due to the fabrication imperfections. (c) MMF (usually V ≫ 2.405) supports multiple numbers of modes such that the incident light is decomposed into the different guided modes, and individual modes propagate with their corresponding propagation constants. The point spread function of an MMF appears as a speckle pattern-a random fluctuation in both amplitude and phase, and thus the image is significantly distorted on the other end.
However, the MMFs have not been conventionally used as endoscopic imaging probes because both the amplitude and phase information and even the polarization are distorted after being transmitted through the fiber, 32 as shown in Fig. 2(c). This property can be understood based on the discrete guided modes in an MMF, with the incident light being decomposed into discrete optical modes that independently propagate through an MMF with corresponding propagation constants and transverse mode patterns. From a geometrical viewpoint, discrete guided modes propagate the fiber with different angles to the axis of the waveguide and consequently different propagation distances, resulting in different propagation constants. In this process, the propagation constants of different modes vary significantly, and thus the phase delay of each mode is effectively random after being transmitted through a fiber. Furthermore, bending and fabrication imperfections of fiber cause perturbations and coupling between the linearly polarized modes, and consequently the light transmission through an MMF can be considered to be a process of mode mixing with random phases. As a result, along with the mode-dependent loss and the modal noise, when monochromatic light is incident on one end of the MMF, randomly fluctuating amplitude and phase patterns appear at the other end regardless of the incident object field. Furthermore, different group velocities of each optical mode and each wavelength, respectively, lead to modal dispersion and chromatic dispersion such that a short optical pulse is temporally broadened during fiber transmission. 29,33,34 This combined distortion effect of spatiotemporal properties of the MMF precludes its use in endoscopic imaging.

Transmission Matrix: Formalism for the Coherent Manipulation of Light
An input-output relationship of a linear optical system, including optical fibers, can be described based on a transmission matrix formalism. A transmission matrix is typically defined on a position basis or a transverse wavevector basis (i.e., plane wave basis). For instance, when defined on a position basis, the transmission matrix element, t m;n , represents the transmittance and phase delay of the wave component from an input point (n) to an output point (m), where n and m represent the indices for the spatially discretized modes of incoming and outgoing waves. Thus a column vector of the transmission matrix represents a point spread function on the output plane for a certain input mode (e.g., a point source on the input plane), as shown in Fig. 3. The overall characteristics of the transmission matrix vary significantly depending on the configuration of the linear optical system between the input plane and the output plane. For an ideal imaging system, the transmission matrix is given as an identity matrix, whereas for a highly scattering medium, the matrix elements can be described as independent and identically distributed complex Gaussian random variables.
In the past few decades, researchers have demonstrated tremendous success in measuring the transmission matrix and demonstrating various coherent light manipulation techniques-optical focusing, imaging, and energy transport-through optically disordered media. 22,[35][36][37] In recent developments, spatial light modulators (SLMs) and interferometric measurement techniques have played a critical role in that they allow for direct access to the transmission matrix with many input and output entries. 26 More specifically, to characterize a transmission matrix, output optical fields are interferometrically measured, and more than hundreds of different incoming wavefronts are incident on the input plane via SLMs such as liquid crystal on silicon (LCoS) and digital micromirror devices (DMDs). For the coherent manipulation of light, the measured transmission matrix can largely be used in two ways: wavefront shaping and computation. Wavefront shaping techniques are physically applied on the input side of the optical system using SLMs to display an optimized wavefront for a desired optical output, such as optical focusing, patterning, and enhanced energy transport. 27,[38][39][40] In contrast, computational techniques are digitally applied on the measured output field to reconstruct optical fields on the input side through various linear algebra operations, such as matrix inversion and singular value decomposition. 36

Phase Conjugation and Matrix Inversion
To outline the mathematical frameworks of coherent manipulation techniques based on a transmission matrix formalism, let us assume that a transmission matrix T is given for a disordered optical medium on a position basis. The relation between the input and output fields is then described as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 1 ; 1 1 6 ; 4 1 1 where E in and E out are the input and output field vectors with the N and M entries, respectively. Then, the phase conjugation operation on the input plane allows for optical focusing on a specific output position. When the input field is given as the complex conjugate of the k'th row of the transmission matrix E in ¼ ½ t Ã k;1 t Ã k;2 ; · · · ; t Ã k;N T , corresponding output components constructively interfere at the target output position of the index k, whereas they randomly interfere at the background modes with random relative phases. Mathematically, E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 3 . 1 ; 1 1 6 ; 3 0 5 jt m;n jjt Ã k;n je iΔϕ m;n−k;n ; where Δϕ m;n−k;n is the phase difference between two transmission matrix elements, t m;n and t k;n . This phase difference randomly fluctuates when m ≠ k, assuming that each matrix element independently follows the complex Gaussian random statistics. Therefore, the ensemble-averaged field amplitude at the targeted position, hjE out k ji, is given as hjt m;n j 2 iN, and the background field, hjE out m≠k ji, is given as hjt m;n j 2 i ffiffiffiffi N p . Therefore, on average, the optical focus can be created with the intensity contrast of N, with respect to the background. Practically, the number of controllable input modes N is limited to the effective number of pixels on the SLM, which varies within 10 3 ∼ 10 6 .
Along with the phase conjugation operation, the inversion operation also serves as a powerful tool for coherent manipulation. Often, the inversion operation is applied to the output field in the form of matrix multiplication to retrieve the input field. Mathematically, the estimated input field E in is given as T −1 E out . However, it should be noted that the inverse operation of the transmission matrix is considerably unstable in a practical sense. Given that the transmission matrix T is represented with T ¼ UΣV Ã by singular value decomposition, where U and V are unitary matrices and Σ is a diagonal matrix with non-negative singular values, the inverse matrix is given as where the diagonal elements of Σ −1 are the inverse of the non-negative singular values. In the case in which the columns of the transmission matrix are highly correlated or rank deficient, a large portion of the singular values are close to zero, which is easily corrupted by small experimental noise. To overcome this issue, a method to only take the singular values over a certain threshold value, called truncated singular value decomposition, or a nonlinear optimization algorithm is often used to reduce experimental errors. This approximated inversion operation often leads to fluctuating background fields (i.e., inaccurate retrieval of the input field), similar to the fluctuating background intensity for optical focusing in the phase conjugation operation.

Controlling Coherent Waves through Multimode Fibers
When one attempts to implement endoscopic imaging through an MMF, light travels twice through an MMF in opposing directions: (1) from the proximal end to the distal end to illuminate an object and (2) from the distal end to the proximal end to convey the back-reflected or fluorescence light (i.e., image information) from an object. Therefore, to reconstruct an image of the object behind the MMF, it is necessary to cancel out the effects of wavefront distortion on the way in and out. In this section, we review novel MMF-based endomicroscopic imaging techniques that compensate for the effects of wavefront distortions based on the light manipulation techniques introduced in Sec. 3.

Optical Focusing
First, the wavefront distortion on the illumination path (i.e., proximal to distal end) can be corrected by physically implementing the phase conjugation operation based on wavefront shaping techniques. [41][42][43] The light transmission through MMFs via an SLM can be modeled based on the transmission matrix formalism where t m;n is an element of the transmission matrix relating the plane of the SLM on the proximal end to the target plane on the distal end and θ n is the phase of the outgoing light at the n'th element of the SLM. Given that the transmission matrix is characterized, based on the principle of phase conjugation, light can be arbitrarily focused on the desired position (i.e. m'th mode) by displaying the phase profile of t Ã m;n ∕jt m;n jðn ¼ 1: : : NÞ on the SLM. Alternatively, without transmission matrix measurements, one may determine the optimal phase of each SLM segment for light focusing by cycling its phase from 0 to 2π and iteratively optimizing each SLM segment to maximize the intensity at the target point 21,44-48 (as illustrated in Fig. 4). It should be noted here that the result of light focusing through iterative feedback is ideally the same as the phase conjugation approach because both methods essentially lead to constructive interference of the transmitted light at the desired position behind an MMF. Figure 4(b) shows the resultant light intensity distribution on the distal end when applying the phase-conjugated wavefront on the proximal end. 31 By virtue of constructive interference, the ideal diffraction-limited resolution could be achieved; however, the focal contrast (i.e., the peakto-background ratio, PBR) was as low as 920. In the demonstrated experiment, the number of propagating modes in the MMF was estimated to be 1870, and thus the expected value of the focal contrast was about π 4 ðN − 1Þ þ 1 ≅ 1470 based on the assumption that each transmission matrix element follows the independent complex Gaussian distribution and only the phase of the input field is modulated. 51 This discrepancy could be attributed to the following factors: (1) only a fraction of the propagating modes effectively contributes to the target spot due to the cylindrical structure of MMFs, 46 (2) no polarization control was applied, 52 and (3) the column vectors of the transmission matrix may be highly correlated in the MMFs. 53,54

Wide-Field Imaging
To compensate for the wavefront distortion on the collection path, it is generally preferable to computationally apply linear algebra operations on the measured output field, instead of applying the physical wavefront shaping techniques. This is partly due to the fact that the effect of wavefront shaping can be exactly computed for a given output field without the use of an additional SLM. More specifically, based on the inversion operation, the target field E distal on the distal end is reconstructed from the measured field E proximal on the proximal side by E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 4 . 2 ; 1 1 6 ; 1 3 3Ê In conjunction with the inversion operation, Choi et al. 49 proposed a novel approach for wide-field microendoscopic imaging through MMFs, which is composed of two steps: an inversion operation to compensate for the wavefront distortion on the collection path and incoherent speckle averaging to correct for the effect of the wavefront distortion on the illumination path. When the illumination is provided through an MMF, a randomly fluctuating speckle field is developed on the distal side. Therefore, the reconstructed intensity (jÊ distal j 2 ) is given as the product of the illumination speckle intensity pattern (jSj 2 ) and the object reflectance (jOj 2 ): E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 4 . 2 ; 1 1 6 ; 6 8 5 where the speckle field S is a granular pattern that looks like random noise. Choi et al. then exploited the fact that the contrast of the incoherent sum of different speckles decreases with ffiffiffiffi N p (where N is the number of independent speckles). 55 In the demonstrated experiments, the different speckle patterns on the distal end were generated by injecting light on the proximal end with different incident angles ðθ ξ ; θ η Þ. This process is mathematically described as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 4 . 2 ; 1 1 6 ; 5 9 0 Figure 4(d) shows the reconstructed image of the USAF target with a FOV of 200 μm, which is about the same as the core size of the MMF used. To demonstrate the practicality in biomedical application, a wide FOV image of biological tissue-the villus in a rat intestine tissue-was also acquired by stitching the reconstructed images while scanning the distal end of the probe across the specimen. Although the transmission matrix is very vulnerable to any change in the configuration of the MMF, they succeeded in reconstructing objects centimeters apart without any additional calibration of the transmission matrix.

Confocal Imaging
Confocal microscopy is an important tool in biological imaging by virtue of its high contrast and depth-sectioning capability. 56 Commercial confocal microscopy requires a high-quality objective lens and scanning mirror to make a focused spot across the object and a pinhole to reject the out-of-focus signals. In 2015, Loterie et al. 50 achieved endoscopic confocal imaging through MMFs by combining the two operations of phase conjugation and matrix inversion, respectively, to compensate for the effect of wavefront distortion on the way in and out.
First, the MMF can replace the light-focusing function of the objective lens and the rotating mirror. Once the transmission matrix is measured, the phase conjugate of the transmission matrix at the proximal end can make a focus on the distal end. The reflected light from the target focus is then collected back through the fiber and recorded at the proximal end. Here, the transmission matrix can be used to implement a digital pinhole in two ways: a digital confocal method and a correlation method. The digital confocal method virtually propagates the collected light back to the focal plane by an inverse transmission matrix operation (T −1 ), and then a digital pinhole mask is applied at the target point. The light intensity inside the virtual pinhole constitutes a single pixel value in the final confocal image. Alternatively, the inverse operation can be estimated with a complex conjugate operation of the transmission matrix when the column vectors of the transmission matrix are nearly orthogonal, TT Ã ∼ I, which is called the correlation method. In particular, in the case of single-mode reconstruction, the correlation method performed comparably to the inversion method, whereas its computation cost is much lower than the inversion method. Figure 4(f) shows a diagram of endoscopic confocal imaging, and the right panels show three different images of a human epithelial cell; the first image is the result without the digital pinhole operation, the second is the result of the digital confocal method, and the third is from the correlation method. Compared with the first image, the right two panels show higher contrast to the extent that the walls and the nucleus of an epithelial cell are clearly distinguished by virtue of an additional confocal filtering process on each focal spot. However, the point scanning rate was limited to 20 to 100 Hz because of the slow operational speed of the SLM, which is not adequate for practical video-rate endoscopic imaging applications. The point scanning rate can be improved up to 40 kHz by combining an SLM with an acousto-optic deflector, but this comes at the cost of a complex beam shaping configuration and reduction in the effective pixel number of the SLM. 57

Scanning Fluorescence Imaging
From Secs. 4.1 to 4.3, we reviewed coherent manipulation techniques and associated label-free endomicroscopic imaging techniques in the reflection geometry. As label-free imaging based on tissue reflectance generally suffers from low image contrast, fluorescence imaging serves as an important endoscopic imaging modality for diagnosis. In this case, the optical signal from the distal end is the fluorescence signal, which is temporally incoherent, and thus the coherent manipulation techniques cannot be applied on the collection path. In this section, we discuss the use of the wavefront shaping technique to implement scanning fluorescence microscopy through MMFs.

Fluorescence endomicroscopy
In scanning fluorescence microscopy, light is focused on the object and the focal spot is scanned across the imaging area or volume. As in the diagram of Fig. 5(a), a DMD was used instead of an SLM to also achieve a fast wavefront modulation of the input wave. However, as DMDs intrinsically are only capable of binary amplitude modulation, the arbitrary phase pattern φðx; yÞ was encoded in the binary off-axis amplitude hologram (or Lee hologram) 60 hðx; yÞ as where u and v are the carrier frequencies to separate the first order from the zeroth order diffracted beam of tðx; yÞ. Only the first order of the diffracted beam is taken by the first lens and the spatial filter at the Fourier plane of that lens such that the desired wavefront e jφðx;yÞ is generated at the Fourier plane of another lens, as in Fig. 5(a). Based on this operation, it is possible to measure the transmission matrix and make a focus on the distal end by using the phase conjugate operation on the proximal end. An excited fluorescent signal from the target point is collected through the fiber and simply summed at the proximal end. 58 In fact, it is not possible to apply a digital pinhole technique, as shown in Sec. 4.3, because we cannot measure the optical field of incoherent light in a holographic manner. Therefore, the fluorescence signal from the fluctuating background intensity could not be filtered out, resulting in a significant decrease in the image contrast. Figure 5(b) shows fluorescence beads (top row) and a monkey brain slice (bottom row) acquired with a fluorescence widefield microscope (left column) and an MMF-based scanning endoscope (right column). It clearly shows the degradation in the image contrast because of the low PBR value. Importantly, the DMD's fast refresh rate of around 20 kHz could allow for video frame rates as required for in vivo applications.

Two-photon endomicroscopy
The imaging principle of two-photon excitation microscopy through an MMF is similar to that of single-photon fluorescence imaging through an MMF, except that a high peak flux of the excitation (typically produced by a femtosecond pulsed laser) is required at the distal end. The reason for this is that the probability of two-photon absorption is extremely low and has a quadratic dependence on the instantaneous optical power. Here, the major issue is that, when a short pulse of the laser beam is incident on the proximal side of the MMF, the modal dispersion in the MMF spreads the incidence power over time, and consequently the phase conjugation operation of the ordinary time-averaged transmission matrix (i.e., transmission matrix measured with a continuous laser) does not result in a temporally focused spot.
In 2015, Morales-Delgado et al. 59,61 proposed a method to characterize a time-gated transmission matrix using an ultrafast reference light pulse to selectively measure and utilize only the optical modes in a similar group velocity for the wavefront shaping process. To further reduce the pulse broadening effect due to modal dispersion and group velocity dispersion through a 20-cmlong MMF, graded-index multimode fiber was selected over the step-index MMF, and two prism pairs were introduced on the input beam incident on the proximal side in the reconstruction step. Using the combined techniques, a diffraction-limited focal spot could be made on the distal end with a pulse width of 120 fs, which was nearly six times smaller than the result obtained without applying wavefront shaping (i.e. ∼745 fs).
With the nonlinear characteristic of two-photon fluorescence excitation, its excitation volume is much smaller than that of the single photon excitation. In particular, in contrast to the single photon excitation, the two-photon fluorescence excitation at the out-of-focus plane, and the background fluctuating intensity at the focal plane were significantly suppressed [see Fig. 5(c)]. Therefore, it shares the same advantage as the confocal microscope-higher contrast and depth-sectioning capability-even without the confocal gating operation. Figure 5(d) shows the two-photon images of fluorescent beads in various depths (z ¼ 4, 14, 32, and 50 μm).

Biological applications
A few successful demonstrations have been reported to show the applicability of the laserscanning imaging technique through MMFs with a major emphasis on the capability to visualize cellular or sub-cellular details of biological tissues. Here, we present representative results from tissue culture cells, neuronal imaging, and cochlea imaging. Figure 6(a) shows two types of tissue culture cells 62 -baby hamster kidney cells (BHK) expressing GFP and hippocampal tissue culture expressing genetically encoded calcium indicator (GCaMP6f). Through a comparison with epi-fluorescence imaging, the capability to resolve individual cells was validated. The reported frame rate to acquire the images in Fig. 6(a) was 7 to 15 Hz for the circular FOV with a diameter of 100 μm and a spatial resolution of ∼2 μm with the use of a DMD. Furthermore, with the aid of a genetically encoded calcium indicator, GCAMP6f, the ability to observe spontaneous neuronal activity was demonstrated 62 [see Fig. 6(b)]. In a separate study, 63   Despite the mechanical movements associated with living animals, MMF-based fluorescence imaging techniques have also proven to be effective for in vivo neuronal imaging. The first and second rows of Fig. 6(e) show in vivo fluorescence images of a subpopulation of inhibitory neurons, 65 somatostatin-expressing (SST) neurons, 65 labeled with a red fluorescent marker (tdTomato) at different regions of a mouse brain, the visual cortex (V1, 0.5 to 0.8 mm in depth) and the cornu ammonis 1 (CA1) region and dentate gyrus of the hippocampus (1.5 and 2 mm in depth). The reported frame rate to acquire the images in Fig. 6(e) was 3.5 Hz for the circular FOV with a diameter of 50 μm and a spatial resolution of about 1.2 μm with the use of a DMD. Additionally, it was shown that the dynamics of hemorrhage in the primary visual cortex 65 could be visualized in a nonlabelled manner based on the intrinsic optical contrast from red blood cells [appearing as dark clusters in the last row of Fig. 6(e)]. Compared with the ex vivo experiments, in vivo experiments generally result in degradation of the image quality because the optical transmission matrix of MMFs is varied from the precalibrated one due to fiber bending accompanied by the insertion and extraction of MMFs in biological tissue. However, the characteristic structures of the target tissues are recognizable at the neuronal scale, and the object planes at different depths are accessible by digital refocusing without mechanically moving the distal tip of the fiber. More importantly, the reported results show that structural and functional in vivo microscopic imaging is possible in a minimally invasive way with a fiber tract width of about 50 μm [as shown in the postmortem histological image presented in Fig. 6(f)]. 65 To summarize various imaging approaches based on MMFs, Table 2 lists the key results from the representative studies based on each approach.

Controlling Coherent Waves through Multicore Fibers
The MCF has long been used for clinical endoscopes as well as commercialized confocal endomicroscopic imaging systems (e.g., Mauna Kea Technologies); however, the finite intercore distance on the facets of the MCF fundamentally limits the resolution with a pixelation issue. In this section, we review two techniques to remove the pixelation artifacts and achieve diffraction-limited imaging through an MCF based on a lensless configuration on the distal end.

Using Transmission Matrix Formalism
Kim et al. demonstrated the binarized transmission matrix approach on MCFs using a DMD located on the proximal side, as in the diagram of Fig. 7(a). Here, instead of implementing phase modulation based on the Lee hologram method, as in Sec. 4.4.1, optical focus on the distal side of the MCF is created based on the binarized phase conjugation operation, in which the output fields from each input mode interfere semi-constructively at the target position (m'th mode) 66 a n t m;n ; a n ¼ 1; 0 ≤ argðt m;n Þ ≤ π 0; otherwise : The total intensity of the excited fluorescence signal from the focus was collected by the same fiber bundle and then measured by the photomultiplier tube. Figure 7(a) shows the experimental setup based on a DMD and an MCF, and Fig. 7(b) shows 2-μm fluorescence bead (top row) and cancer cell (bottom row) images obtained through the conventional transmission imaging technique (left column) and the endoscopic fluorescence imaging technique based on the transmission matrix (right column) through the same MCF. The pixelation artifact was completely removed, and the resolution was reduced to 1.07 μm, which was less than the diameter of the cores of the MCF, 66 which is usually about 3.45 μm. Previously, conventional MCF-based microendoscopy (pCLE) captured the light intensity from object points that are optically conjugate to each core of the MCFs, limiting the lateral resolution to the size of the core. In contrast, the transmission matrix formalism and associated wavefront shaping techniques can achieve the diffraction-limited resolution supported by the entire facet of the MCF and remove the pixelation artifacts by compensating for the effect of mode-mixing and phase delays through the cores. However, assuming that each transmission matrix coefficient follows an independent complex Gaussian distribution, the theoretical PBR value in light focusing via binary amplitude modulation is N 2π for N spatial input modes, 68 which significantly degrades the image contrast.

Using Speckle Correlation
Noninvasive imaging via speckle correlation is one of the representative computational imaging techniques for a thin scattering layer that presents the characteristic input-output relation called the "memory effect." 69,70 The memory effect is the result of the short-range correlation within a transmission matrix of a thin scattering medium in which a lateral shift of the point source on the input side results in a lateral shift of the speckle pattern on the output side, assuming that the point source and the measurement plane on the input and output sides are placed at the far-field distance from the scattering medium. This shift-shift memory effect on the far-field manifests when the angular spectrum of the scattered field on the output surface of the scattering medium is nearly conserved for different incidence angles, which is only guaranteed when a diffusive scattering medium has a thickness comparable to the wavelength. 28,71,72 The MCFs with a small core diameter can be considered to be an infinitesimally thin random phase plate that exhibits an ideal memory effect 67 [see Fig. 7(c)]. Thus, with a spatially incoherent object on the distal side, the captured intensity image on the proximal side through MCFs is simply given as the superposition of the laterally shifted speckle patterns that are generated from each object point. The mathematical formula for this relation is represented with the simple convolution operation between the object's intensity distribution OðrÞ and the speckle intensity pattern SðrÞ associated with a random phase delay map of the cores of an MCF as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 5 . 2 ; 1 1 6 ; 4 7 1 IðrÞ ¼ OðrÞ Ã SðrÞ: Then, taking the autocorrelation, denoted as ⊗, of the measured intensity image IðrÞ, the convolution theorem yields the following relation: IðrÞ ⊗ IðrÞ ¼ ½OðrÞ Ã SðrÞ ⊗ ½OðrÞ Ã SðrÞ ¼ ½OðrÞ ⊗ OðrÞ Ã ½SðrÞ ⊗ SðrÞ: The autocorrelation of the speckle pattern is the sharply-peaked function with the diffractionlimited width, SðrÞ ⊗ SðrÞ ∼ δðrÞ. Therefore, the autocorrelation of the intensity image will approximate the autocorrelation of the object's intensity distribution as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 5 . 2 ; 1 1 6 ; 3 4 7 IðrÞ ⊗ IðrÞ ∼ OðrÞ ⊗ OðrÞ: From the estimated object's autocorrelation function, one may directly reconstruct the original object function using a phase retrieval algorithm. 73 The results associated with each computational process are presented in Fig. 7(d).
Compared with the transmission matrix, this speckle correlation can achieve a diffractionlimited resolution with only a single-shot image. Also, because this method does not depend on any premeasured data, it is free from the issue of fiber bending and translation. However, the fundamental limit of this approach is that the optical field at the object points must be temporally coherent and spatially incoherent, and the object needs to be uniformly illuminated. Otherwise, the mathematical formulation for the correlation imaging breaks down. There have been approaches to reconstruct the fluorescence image even with a nonuniform illumination by utilizing the memory effect on the illumination path, albeit without the single-shot advantage. 69,74 Second, it does not have depth scanning capability, which restricts the axial dimension of the object below an axial decorrelation length of the speckle point spread function.
To summarize the various imaging approaches based on MCFs, Table 3 lists the key results from each approach.

Advanced Techniques
The MMF-or MCF-based endoscopic imaging techniques introduced in Secs. 4 and 5 have several intrinsic limitations such as the vulnerability of the precalibrated transmission matrix to mechanical and thermal perturbations and the entangled spatio-temporal distortion in each guided mode. Here, we review some in situ transmission matrix calibration techniques to overcome the external perturbations and some novel techniques to exploit the spatio-temporal characteristics of the optical fiber.

In Situ Transmission Matrix Calibration
All of the methods except the techniques relying on the speckle correlation (in Sec. 5.2) are based on the assumption that the transmission matrix is precalibrated and unchanged after insertion. However, the transmission matrix is extremely vulnerable to any changes in the fiber's configuration (e.g., bending and deformation of fiber) and temperature distribution. For example, the performance of endoscopic imaging based on the precalibrated transmission matrix is valid up to only a few millimeters of fiber-tip translation 49 or a few degrees of temperature changes. 50 Plus, recalibration of the transmission matrix is not possible in practical endoscopic imaging scenarios in which the distal end of the fiber is embedded deep inside the body. To overcome this limitation, numerous approaches have been proposed to calibrate the transmission matrix without any physical access to the distal tip.
A straightforward approach for in situ calibration is to place additional elements at the distal end. One may use point-like sources, such as a nonlinear guide star 75,76 and a virtual beacon source, 77 to directly create an optical focus by feedback-based wavefront shaping 45 or optical phase conjugation. Alternatively, a transmission matrix can be calibrated by illuminating the proximal end and measuring the backward propagating wavefronts from the partial reflectors at the distal tip. [78][79][80] Recently, it has also been shown that a target object itself can serve as a guide star to identify the core-to-core phase retardations of the MCF and reconstruct a diffractionlimited and pixelation-free image. 81 With the reflection matrix of the MCF measured with core-to-core illumination, the singly-reflected waves from a target object can be coherently accumulated based on the adaptive algorithm, called closed-loop accumulation of single scattering. 82 Another notable approach is to estimate the transmission matrix from the partial information of the matrix elements, which is possible due to the cylindrically symmetric behavior of light within conventional optical fibers with a circular cross-section, called the "rotational memory effect." 53,54 The rotational memory effect enables optical access to a large facet area (i.e. isoplanatic patch) from a point-like source at the distal tip. 54,83 More remarkably, it has been also shown that the transmission matrix of an MMF can be estimated without any calibration by decomposing the transmission matrix into the guided modes on a circular polarization basis. The circularly polarized modes are less affected by the effect of polarization coupling due to the cylindrical structure of the fiber, thus allowing for the prediction of the transmission matrix of a deformed fiber by estimating only the phase delay values of each mode from the given bending status. 84 Other computational imaging approaches such as compressive sensing and deep learning methods have also recently been explored for calibration and imaging through MMFs or MCFs. [85][86][87][88] Rather than repeatedly recalibrating the transmission matrix, there also have been continuous efforts to design a bending-invariant structure of optical fibers, 89 such as an MMF with a parabolic refractive index profile, 90 a twisted MCF, 91 and even a hybrid structure of the MMF and MCF for both high-resolution imaging and motion-insensitive low-resolution image. 92 Single shot Single-shot image without any calibration, Requirement on the large distance between the object and the input facet Alternatively, to circumvent the calibration issue in using an MMF, some approaches have been proposed to encode the spatial information to the spectral information by placing a spatiospectral encoder at the distal tip of a fiber 93,94 and reconstruct an image from spectrally resolved detection at the proximal tip.

Using Spatio-Temporal Characteristics of Optical Fibers
Coherent light manipulation methods, introduced in Secs. 4 and 5, mostly deal with only the spatial distortion of an optical field transmitted through fibers. However, as each optical mode of a unique spatial field profile has an associated propagation velocity, spatial and temporal distortions through fibers are entangled with each other, resulting in time-varying speckle-like patterns within a broadened pulse duration for an incident optical pulse. Here, we review some approaches that utilize these spatio-temporal characteristics of fibers.
Recently, microendoscopic imaging capability through an MMF has been successfully extended to include depth information and achieve 3D imaging. High-speed wavefront shaping using a DMD, along with time-of-flight light detection, enabled recovery of depth information alongside 2D reflectance images at a near video rate. 95 Based on spatio-temporal mixing through a fiber, the temporal dynamics of the pulse can be manipulated by only controlling the spatial profile of the incident wavefront, such as in conserving a pulse shape [96][97][98] and enhancing the transmittance at a chosen delay time (i.e., temporal focusing). 99 Furthermore, an ultrafast wavefront shaping device was developed using a 5-m-long MMF as a spatiotemporal mixer to generate a desired vector spatiotemporal field. 100 Finally, relying on the reciprocal relation between time and frequency, the spatio-temporal characteristic further leads to the spatio-spectral mixing of an optical field through a fiber, enabling novel all-fiber spectroscopic techniques. 101

Conclusion
Here, we reviewed both physical and computational approaches to exploit the coherent properties of MMFs and MCFs for endomicroscopic imaging. In particular, although MMFs intrinsically have a high information transport capacity, they have not been considered to be imagecarrying probes due to their random mode mixing property upon transmission. However, recent investigations on measuring and controlling such randomness inside disordered optical media have led to wavefront shaping techniques and computation methods to deterministically use MMFs based on the transmission matrix formalism. These approaches have been demonstrated to provide a diffraction-limited resolution with an ultrathin probe, which is a critical feature for observing cellular and subcellular morphological changes associated with various diseases.
However, MMF-based endomicroscopy techniques have several intrinsic limitations for widespread use in practical clinical applications. First, in their current form, they highly rely on precalibration of the transmission matrix. Although some in situ calibration techniques have been developed, their performance is still insufficient to be used in real clinical practices in which fast and accurate calibration is required and bending states are not exactly known. This limitation significantly degrades the performance of wavefront shaping and computational image retrieval and restricts the probe's movement that is required to observe different parts inside the body. Second, the coherent manipulation techniques for MMFs implicitly assume the use of a coherent light source and label-free imaging of internal tissues. Although fluorescence staining can be used for laboratory experiments, as described in Sec. 4.4, in general, fluorescent dyes compatible with the human body are limited. In label-free reflectance imaging with a single wavelength, biological cells and organelles typically provide poor contrast. Finally, the limited refresh rate of SLMs practically results in a small FOV. With relatively fast SLMs, such as DMDs, the refresh rate of tens of kHz results in an imaging rate of ∼10 fps with ∼1; 000 imaging pixels, which is around 100 to 1,000 times smaller than the pixel counts of conventional SD or HD video endoscopes. Thus, the FOV is currently limited to tens of micrometers for a resolution of ∼1 μm, indicating that an MMF-based imaging technique alone would not provide proper diagnostic capabilities.
Based on the developments in the past decade, researchers have already shown some potential solutions to the aforementioned limitations. In particular, with the efforts to utilize unique wave phenomena in MMFs based on computational methods, we anticipate that the coherent manipulation of optical fibers could realize the transformative capability of real-time histology diagnosis.