Live-subject microscopies, including microendoscopy and other related technologies, offer promise for basic biology research as well as the optical biopsy of disease in the clinic. However, cellular resolution generally comes with the trade-off of a microscopic field-of-view. Microimage mosaicking enables stitching many small scenes together to aid visualization, quantitative interpretation, and mapping of microscale features, for example, to guide surgical intervention. The development of hyperspectral and multispectral systems for biomedical applications provides motivation for adapting mosaicking algorithms to process a number of simultaneous spectral channels. We present an algorithm that mosaics multichannel video by correlating channels of consecutive frames as a basis for efficiently calculating image alignments. We characterize the noise tolerance of the algorithm by using simulated video with known ground-truth alignments to quantify mosaicking accuracy and speed, showing that multiplexed molecular imaging enhances mosaic accuracy by leveraging observations of distinct molecular constituents to inform frame alignment. A simple mathematical model is introduced to characterize the noise suppression provided by a given group of spectral channels, thus predicting the performance of selected subsets of data channels in order to balance mosaic computation accuracy and speed. The characteristic noise tolerance of a given number of channels is shown to improve through selection of an optimal subset of channels that maximizes this model. We also demonstrate that the multichannel algorithm produces higher quality mosaics than the analogous single-channel methods in an empirical test case. To compensate for the increased data rate of hyperspectral video compared to single-channel systems, we employ parallel processing via GPUs to alleviate computational bottlenecks and to achieve real-time mosaicking even for video-rate multichannel systems anticipated in the future. This implementation paves the way for real-time multichannel mosaicking to accompany next-generation hyperspectral and multispectral video microscopy.
The development of miniaturized devices for live subject microscopy has potential for broad impact on a number of biological research questions and biomedical applications, including the optical biopsy of cancer at the cellular and subcellular levels.1 Although these devices are typically able to resolve cellular objects, their relatively small field-of-view limits scalability, as clinicians may need to survey large (macroscopic) areas.2 This limitation stands in the way of practical utility and translation to the clinic. Consequently, achieving both microscopic resolution and macroscopic field-of-view optical imaging remains a central challenge in the field, especially in fluorescence-guided surgery.2
Microimage mosaicking offers an approach for expansion of the field-of-view without loss of resolution, aiding visualization and interpretation of microscale features across macroscopic areas of tissue. Mosaicking is an image analysis technique in which sequential frames from a sequence of images (i.e., video) are examined for common spatial features and then stitched together by overlapping the images’ shared regions pairwise.3 Previous elegant mosaicking algorithms were developed to accompany monochrome (single-channel) microscopy, endomicroscopy, and other related modalities.4,5 These methods rely only on spatial information to connect frames. These algorithms have since been optimized to address the unique imaging artifacts associated with live-subject microscopy, using creative image alignment and stitching methods.6,7
The advent of multidimensional imaging techniques, such as multispectral and hyperspectral imaging,8–10 and their translation to endoscopy and other multiplexed molecular imaging techniques,11–13 provides novel information about frame-to-frame motion that single-mode techniques are not equipped to analyze. For example, the existence of strong narrow-band signals from multiple fluorophores can be leveraged to solve the matching problem in a higher signal-to-noise ratio (SNR) domain after unmixing,14 simultaneously improving accuracy and computational efficiency. Thus, existing mosaicking algorithms should be adapted to leverage spatial and spectral data and enable these techniques to image micro- and macroscopic environments with greater fidelity than their single-channel predecessors.
This work presents a first step in the development of multichannel mosaicking algorithms. We focus on characterizing and measuring the improvements to mosaic image registration and alignment using spatial information across two or more channels and comparing them to analogous single-channel methods. Here, we use multichannel in reference to either the set of raw spectral bands produced by hyperspectral imaging or the set of endmember abundance maps calculated via spectral unmixing. We present an application of this work using hyperspectral imaging in a biological context. However, the algorithm may be useful in a broad range of applications where multichannel methods are employed, such as remote geological sensing and astronomical imaging, if used in a setting where the dominant component of motion is translational and any nontranslational motion is small in comparison (see Sec. 3.3).
When processing multiple spectral (spatially coinciding) channels, it is clear that a single unique spatial alignment should be applied across all channels in a set of hyperspectral frames (provided achromatic optical performance). Independent alignment of different channels could obscure colocalization information, which is often of primary interest in hyperspectral data analysis.10,15 By calculating alignments independently for several spectral channels and leveraging this extra information, we demonstrate that it is possible to improve frame alignment accuracy and tolerance to noise beyond the performance metrics set by their single-channel counterparts.
The inclusion of spectral information into the image alignment calculations adds nuance to the method and is explored in this work. Specifically, there is a trade-off between the addition of spectral channels for increased accuracy and computational speed, from which the user should be able to determine the optimal configuration for their data set. First, some channels within a multichannel data set may be more valuable to the mosaicking accuracy than others depending on their degree of correlation with each other. That is, using totally uncorrelated channels (e.g., two spectrally distinct fluorescence channels) increases the noise tolerance of the mosaic more than the addition of correlated channels. Therefore, a subset of the higher priority channels may be selected to reach a target accuracy and speed. Informed by this, we present a method to choose the optimal subset of channels that maximizes the mosaicking performance. We show that this new metric, called the dimensionality score (), reliably predicts the noise tolerance of both hyperspectral and unmixed multichannel data sets. Therefore, the highest priority channels are predicted to be those that maximize this metric.
Previous reports simulate video registrations from a single image to characterize mosaicking accuracy against a known ground truth.16 Here, we employ a similar method to characterize the noise tolerance and computational efficiency of the presented algorithm to enable quantification of mosaicking performance relative to single-channel analogues. Simulated hyperspectral video frames were generated with a constant, high percentage (94%) area overlap between consecutive frames to approximate conditions for video-rate imaging. The current algorithm is primarily intended for handheld probes, for which probe motion during acquisition is unavoidable. To combat motion artifacts, these probes are operated at fast imaging rates (video-rate speeds) that outpace the probe motion during image acquisition. The nonoverlap area percentage is simply the ratio of probe displacement (i.e., in pixels traveled) to image size, which must be kept small as a direct consequence of this method of minimizing motion artifacts. Therefore, while a low amount of overlap is common for microscopy tilings or similar methods, this algorithm is intended for use with systems that produce sequential images with large overlapping areas.
Noise was then introduced to the simulated frames to stress the algorithm and introduce inaccuracies. Additive Gaussian, multiplicative Gaussian, and Poisson noise types were investigated, covering common models that approximate dominant noise sources in both confocal microscopy and hyperspectral imaging.17,18 We apply the simulated noise tests to quantify the improvements made by the present multichannel mosaicking algorithm as well as to measure the predictive power of the aforementioned dimensionality score. These hyperspectral images were also linearly unmixed into their molecular constituents, creating unmixed channels for which was calculated. However, note that unmixed channels are generally denoised compared to spectral channels,14 which is intrinsically superior to using raw spectral channels but relies on faster than real-time unmixing for online applications.
Developing a precise quantification of mosaicking error to measure the noise tolerance requires objective quantification that may be beyond human perceptions of mosaicking errors and artifacts. To ensure that the improvements found for the noise tolerance translate into improvements in image quality as perceived by the eye, we developed a separate empirical test. We collected a second set of hyperspectral frames using a commercial microscope and compared mosaics made from our algorithm to gold standard tiling of the same area assembled using the microscope’s precision stage and software.
Furthermore, we leverage improvements in computational throughput available with parallelized computing and graphics processing units (GPUs),19 allowing computationally expensive multichannel analysis at video-rate (15 frames-per-second, fps) speeds for up to 10-channel mosaicking. Thus, image computation speeds can be comparable to associated data acquisition rates,20 enabling future online applications in the clinic. These improvements may be applied to mosaicking, and pairing these algorithms with the development of faster-than-real-time spectral decomposition20 would pave the way for real-time construction of macroscopic maps of multiple molecular constituents (endmembers) of tissue with microscopic resolution.
Multichannel Mosaicking Algorithm
The presented algorithm applies pairwise normalized cross-correlation21 (NCC) as a basis for frame registration. Following Bedard et al.,5 cross-correlation maps are calculated between consecutive image frames [Figs. 1(a) and 1(b)] and the maps are normalized to an in-place autocorrelation (i.e., the theoretical maximum correlation). Correlation space then maps the probability of alignment across the spatial domain spanned by the intersection of two images being aligned, from which the location of maximum likelihood is selected as the “true” alignment location for the subsequent stitching step. We calculate the spatial cross-correlation map separately for each additional channel and average the maps in correlation space, producing a single map [Figs. 1(c) and 1(d)].
The code enabling this algorithm was developed in MATLAB R2019a (The MathWorks) in a Linux environment (Ubuntu 18.04LTS) with an Intel i7 6800K 3.40 GHz CPU and 32 GB DDR4 2400 MT/s RAM. The substantial computational cost of calculating multiple cross-correlations per frame-pair was alleviated by using MATLAB’s Parallel Computing Toolbox v7.0 for GPU interfacing. Speed benchmarks were carried out using a GeForce GTX 1080 Ti graphics card (NVIDIA). The code implementing this algorithm is available at the GitHub repository https://github.com/springlabnu/Multi-Channel-Mosaicking.
Simulated Data Generation and Mosaicking Performance
Quantitative analysis and comparison to equivalent single-channel methods was carried out using a simulated video-rate data set derived from images obtained by empirical microscopy. We acquired a hyperspectral image cube (26 spectral bands, images shown in Fig S1 in the Supplemental Materials) of fixed Madin–Darby canine kidney cells stained with three dyes for visualization of the nucleus (Hoechst 33342), cytoskeleton (Alexa Fluor 488-Phalloidin), and mitochondria (MitoTracker Red CMXRos) using a laser scanning confocal microscope (Olympus FV3000). An artificial hyperspectral video stream was then generated from this data by cropping the field-of-view and traversing the image in a continuous raster scan with 94% area overlap between pairs of images [raster scan pattern illustrated in Fig. S2(a) in the Supplemental Materials]. This created a sequence of 65 hyperspectral image cubes () with a known ground truth alignment for quantitative algorithm characterization. The image cubes were also linearly unmixed into their three molecular constituents, creating three unmixed channels that could also be used for comparative analysis.
Having the ground truth location for each frame means that, for each trial, the alignments may be checked against the lookup table of correct locations without the need to visualize the full mosaic. From here, we quantified mosaicking accuracy as the fraction of the 64 frame pairs (one less than the number of frames) that were correctly aligned compared to the ground truth (i.e., the original image). The alignment was counted as correct only for an exact match. Zero-mean additive Gaussian noise was added to each image in every frame, in order to degrade image quality and to mimic challenging experiments that could potentially cause alignment mistakes. In order to quantify noise power of the simulated images, we use the peak SNR, defined as
The mosaicking accuracy is approximately equal to the probability that a single pair of frames are correctly aligned ; that is, correct alignment is a Bernoulli variable with probability for which the amount of noise is a predictor. The probability of Bernoulli variables is given by a logistic function;22 this allows us to connect the fraction of correct alignments () to the noise strength causing the misalignments (PSNR) by
Testing of the present algorithm was carried out using raw spectral channels. The unmixed channels were only analyzed by calculating their score to compare their predicted performance. For each value, was calculated for every permutation of channels, from which the permutation with the maximal value was selected as the optimal subset. As a control, a permutation was selected subjectively by an operator based on prioritizing channels with the brightest signal corresponding to each molecular constituent, intended to mimic selection of channels without knowledge of . Histograms of all possible values (all possible permutations) are shown for these cases in Fig. S3 in the Supplemental Materials to demonstrate the optimization versus the user-selected permutations. Additive Gaussian noise with was added to each channel in each image within the configuration, after which the algorithm was run five times, with being saved after each trial. The time for each trial to complete the full mosaic was also saved. The applied variance was translated to PSNR using Eq. (1), fit to Eq. (2), from which the was extracted. Independent two-sample -tests (Welch’s -test) were calculated to find the significance of the improvements in through optimization of subset permutation, as well as other comparisons.
These experiments were repeated for multiplicative Gaussian and Poisson noise. For these noise types, the amount of noise power delivered depends on signal, so the value extracted is signal dependent as well. The characteristic performance is thus amended to (i.e., normalized by the intensity) accordingly for these noise types.
Empirical Data Generation
While the simulated data analysis thoroughly examines and characterizes the present methods, the ultimate test of the multichannel algorithm and the predictive power of is to verify that they produce useful images from an empirical data set subject to real-data constraints (Fig. 2). To create an empirical data set, ovarian cancer cells (POWDER cell line, Cellaria Biosciences) were plated and stained with two fluorescent dye stains (Hoechst 33342 and DiI) and two fluorescent dye–antibody conjugate markers (PerCP-CD44 and AF647-EGFR). Using a confocal fluorescent microscope with a 32-channel hyperspectral detector (Zeiss LSM880, channel bands shown in Table S1 in the Supplemental Materials) 30 frames were generated by taking separate images after manual movements of the stage, generating a set of frames with a wide range of overlap amounts (estimated at 77% to 96% based on alignments reported by the algorithm). Mosaicking was carried out for a 10-channel configuration, whose channel constituents were chosen as the set that maximizes (i.e., the optimal 10-channel subset). This was compared to a single-channel control representing the analogous single-channel method. The same area was imaged using a tiling protocol available in the microscope’s software (ZEN Blue 2.6) to create a gold standard image for qualitative comparison of the mosaics.
Results and Discussion
In an effort to predict the optimal subset of channels for multichannel mosaicking, we developed a new metric termed “dimensionality score” that quantifies the degree of independence of a given grouping of data channels, and by extension, predicts the characteristic noise tolerance. The optimal channel set, then, is predicted to be that which maximizes this score (thereby maximizing noise tolerance). This metric must capture the primary means by which the present multichannel algorithm suppresses the effects of noise to retain mosaicking accuracy.
The multichannel integrated NCC map (the mean NCC map; Fig. 1) suggests that the present approach offers two means of noise suppression. Adding noise to the image channels generates similar noise levels but with uncorrelated patterns in the individual correlation maps, and comparison of Figs. 1(b) and 1(d) shows that the variance in NCC space is reduced by taking the mean of multiple NCC maps, since random noise fluctuations area averaged out and reduced over several spectral channels. However, comparison of Figs. 1(a) and 1(c) reveals that this process also populates (sharpens) the central true peak feature while reducing the magnitudes of false peaks. Hence, multichannel mosaicking is more immune to noise compared to the single-channel case not only through averaging out random noise in NCC space but also because this spectral averaging apparently suppresses the false NCC peaks due to separate biological objects that share similar structural features, and therefore enhances the strength of the true peak versus false peaks. The strength of this suppression of false peaks is dependent on averaging over diverse NCC landscapes. Averaging multiple NCC maps from weakly correlated or uncorrelated channels causes true peak enhancement by utilizing a set of independent NCC maps with the same true peak location while stochastic false peaks generally are not amplified and tend to average to zero.
By this reasoning, the noise tolerance of a given set of channels depends primarily on the number of channels and the degree of dependence between those channels. The coefficient of correlation is already a well-known descriptor of the degree of dependence between two samples,23 so it forms a natural basis for constructing such a descriptor. Since the samples here are images, we can calculate the two-dimensional (2-D) coefficient of correlation , which is defined for 2-D matrices (images) and as
The selection of has a significant effect on the performance of as a predictor for . An appropriate value of realizes the separation of values characteristic of channels that image nearly identical features. For a matrix of values, this effect results in a characteristic division of the matrix into distinct neighborhoods [Fig. 3(a)]. We demonstrate this effect for , leading to three such areas that likely correspond to the three molecular constituents imaged in this dataset. Without the knowledge of a ground truth for grading performance, an appropriate value of may be estimated (Fig. 2) through analysis of a histogram of [Fig. 3(b)] with the intention of separating the population of strongly correlated channel pairings from the rest. Here, with knowledge of the ground truth, the value that provides the greatest predictive strength may be found by maximizing the Pearson correlation between and (the characteristic noise tolerance), with a peak value at giving a squared Pearson correlation of [Fig. 3(c)]. However, our results indicate that a considerable range [shaded area in Figs. 3(b) and 3(c)] of values correspond to a central peak of Fig. 3(c), and therefore retain near-optimal predictive performance. Therefore, we expect that an estimation of using the aforementioned histogram method will still ensure that remains a reliable and useful predictor of mosaicking performance. Note that the limits of and both reduce to , which is an intuitive descriptor of dimensionality; simply the mean of one minus cross-channel correlation, scaled by . However, in this limit, there is only one operative term (). Therefore, it is apparent that both noise suppression modes are not captured in this limit, resulting in a weakened (lower Pearson’s correlation) prediction of noise tolerance.
Even with our parallelized implementation, calculating the cross-correlation remains the dominant computational bottleneck, and the mosaicking computation time thus linearly scales with the number of channels used. is employed to optimize the selection of a high-priority subset of channels in an effort to lessen this computational load. The optimal configuration is found by maximizing over channel selection choices that are sufficiently speedy (i.e., limiting the number of channels to reach a target mosaicking rate). This enables to be a practical tool for optimizing the selection of subsets of available channels. A brute-force calculation of scores for all possible permutations of the desired configuration (of which there may be hundreds or thousands) can be computed in a reasonable time frame, providing comparative values that predict each permutation’s noise tolerance. The problem of selecting the optimally performing subset of channels for mosaicking is, to the best of our knowledge, otherwise intractable, except by empirical trial-and-error by the operator combined with a heuristic quality criterion.
Quantitative Mosaicking Performance
Increasing the number of spectral channels improves the mosaicking accuracy under noise load, as indicated by plotting versus PSNR [Fig. 4(a); see also Eq. (2)]. For a given PSNR, more channels always perform better (higher ), and at a given , having more channels allows us to tolerate stronger noise levels (see Table S3 in the Supplemental Materials). These results are replicated for multiplicative Gaussian noise and Poisson noise, as shown in Fig. S5 in the Supplemental Materials. Therefore, the multichannel algorithm is expected to achieve greater accuracy than equivalent single-channel methods for noise models relevant in microscopy. Two permutations were tested for each configuration (number of channels): the optimal permutation predicted by and an operator-selected control (these permutations are shown in Table S2 in the Supplemental Materials). The control was assembled by selecting subsets of channels that evenly sampled the three molecular constituents, intended to represent an operator’s educated guess at the optimal configuration of channels (the alternative to using the dimensionality score for optimization). In all cases, the optimized permutation was superior (lower ) to the control [Fig. 4(b)].
The correlation and thus predictive strength remains high between the values and [Fig. 4(c), ] for the optimized subsets. This is very similar to the previous correlation value of 0.97 between values and for the operator control channel sets. Therefore, retains its prediction strength among the various subsets of optimized permutations. In all cases tested, the optimized permutations provide superior performance over the operator-selected ones [Fig. 4(d); at least ]. While the improvement is most striking for small numbers of channels, the effect is present for all cases, confirming an increase in noise tolerance by use of the dimensionality score to predict the optimally performing subset of channels.
The dimensionality score plotted against the mosaicking speed [Fig. 4(d)] demonstrates the trade-off between computational speed (increasing ) and dimensionality score (increasing ) for both control and optimized permutations of the raw channels, as well as the optimized permutations of unmixed channels. These plots were fit to power law curves, and two-tailed -tests were evaluated for the coefficients of these fits to show that: (1) unmixed channels outperform optimized raw channels () and (2) optimized raw channels outperform operator-selected channels (). Here, higher performance is denoted by a more elevated trendline, delivering faster performance and forecasting better noise tolerance.
Based on these results, the multichannel algorithm and optimization of channel selection should enable greater accuracy for real-data mosaicking. Section 2.3 outlines the generation of an empirical data set (without the privilege of ground truth) for this purpose. This empirical set consists of a series of images with varying interframe overlap of a primary tumor cell culture stained using four spectrally distinct fluorescent probes and captured by 32-channel microscopy using a precision stage. We followed the previously discussed steps to optimize the choice of channels for video-rate mosaicking (i.e., limited to up to 10 channels to retain video-rate computation). We estimated using a histogram [Fig. 5(a)] of correlation values () from the first video frame, finding that channels 1, 2, 12, 13, 21, 22, 26, 27, 28, and 32 maximized . The images were analyzed by examining the unmixed composite images created by unmixing the raw data into the four endmembers (dyes) and creating a color map to represent all four dyes in one image. The Zeiss tiling mentioned earlier [Fig. 5(b)] provides a gold standard for qualitative comparisons of mosaicking accuracy and for evaluating the presence of artifacts.
The single channel method produced an image [Fig. 5(c)] with many severe artifacts, including sharp edge artifacts and clear examples of feature duplication. These errors are presumed to result from alignment mistakes caused by noise distorting similar areas between frame pairs, ultimately hindering the NCC alignment step. The 10-channel optimized mosaicking [Fig. 5(d)] qualitatively outperformed the one-channel control, producing an image largely free of artifacts, substantially mimicking the gold standard. Additionally, step-by-step comparisons of the video feed and the associated intermediate mosaic can be visualized in Video 1.
This empirical test demonstrates that significant improvement was achieved in a real-data scenario compared to the single-channel analog. Cross-correlation-based algorithms have been implemented successfully for handheld microendoscopes in the literature,5 so the methods here should achieve similar results for multichannel versions of these probes. However, the primary limitation of the cross-correlation-based method implemented here is that alignments are limited to translational corrections, and therefore nontranslational motion must be negligible. To overcome this limitation, micromosaicking algorithms have been developed far beyond single-channel correlations and now typically utilize feature-based alignment strategies that allow for more complex frame-pair alignments such as affine transformations7 and local warping corrections.6 The success of the multichannel NCC method here motivates integration of these advanced methods with the present multichannel mosaicking algorithm in a manner that similarly takes advantage of the unique information available in multispectral and hyperspectral data. A limitation of this method is that smooth image layering such as graph-cut-based stitching7,24 or alpha blending of overlapping areas4 have not yet been integrated. The present algorithm also does not compensate for compounding errors (or drift error), where small misalignments may build up due to the pairwise image alignment method and cause a significant discrepancy after many alignments, as it cannot correct itself after misalignments, as demonstrated in Fig. S2(b) in the Supplemental Materials. This problem has been addressed in earlier publications6 by referencing a global alignment system and refining pairwise alignments to fix these issues. Here, the results motivate the integration of such methods with the present algorithm in future work.
In summary, we report a multichannel micromosaicking algorithm capable of processing up to 10 channels (spectral or unmixed molecular basis channels) at video rate (15 fps). This development is motivated by the anticipated need for real-time analysis of multiplexed microscopy and microendoscopy image cube data streams, as hyperspectral technologies continue to advance in speed and spectral resolution. We characterized the noise tolerance of a given grouping of synthetic spectral channels with a dimensionality score that captures two forms of noise tolerance enforced by the algorithm, and show that this dimensionality score is a reliable predictor of algorithm accuracy and speed performance. This enables the determination of a high-value subset of channels for computational bandwidth-limited situations. The predicted trends from the dimensionality score are matched by actual increases in performance in noisy conditions. We empirically demonstrate the improved performance with a second data set, showing higher mosaic fidelity and fewer artifacts than the single-channel method when compared to an accepted gold standard. We also demonstrate that unmixed endmember abundance maps offer a better optimization of mosaicking speed and dimensionality score than raw spectral channels. This result motivates the further development of accelerated real-time unmixing and its integration with multichannel micromosaicking. Future work includes development of a multichannel feature-based algorithm to unite hyperspectral micromosaicking methods with complex frame pair alignment strategies needed for in vivo microscopy using handheld probes.
We thank Dr. E. Cronin-Furman (Olympus Corporation of the Americas Scientific Solutions Group) for assistance in acquiring images described in Sec. 2.2 and shown in Fig. S1 in the Supplemental Materials. We also would like to thank the Institute for Chemical Imaging of Living Systems at Northeastern University for assistance in acquiring the images described in Sec. 2.3 and shown in Fig. 5. This work was supported by the U.S. National Institutes of Health Grant No. K22CA181611 (to B.Q.S.) and the Richard and Susan Smith Family Foundation (Newton, Massachusetts) Smith Family Award for Excellence in Biomedical Research (to B.Q.S.).