Hybrid random walk-linear discriminant analysis method for unwrapping quantitative phase microscopy images of biological samples

Abstract. Standard algorithms for phase unwrapping often fail for interferometric quantitative phase imaging (QPI) of biological samples due to the variable morphology of these samples and the requirement to image at low light intensities to avoid phototoxicity. We describe a new algorithm combining random walk-based image segmentation with linear discriminant analysis (LDA)-based feature detection, using assumptions about the morphology of biological samples to account for phase ambiguities when standard methods have failed. We present three versions of our method: first, a method for LDA image segmentation based on a manually compiled training dataset; second, a method using a random walker (RW) algorithm informed by the assumed properties of a biological phase image; and third, an algorithm which combines LDA-based edge detection with an efficient RW algorithm. We show that the combination of LDA plus the RW algorithm gives the best overall performance with little speed penalty compared to LDA alone, and that this algorithm can be further optimized using a genetic algorithm to yield superior performance for phase unwrapping of QPI data from biological samples.


Introduction
There is increasing interest in the application of quantitative phase imaging (QPI) as a high-sensitivity method to study the structural and dynamic behavior of biological samples. 1,23][14] In any interferometric QPI dataset, the phase ambiguities inherent to QPI have to be removed to achieve continuous phase distributions and precise, reproducible measurements of these samples. 15asic phase unwrapping of large jump discontinuities are easily handled by a variety of algorithms, including the Flynn or widely used Goldstein method. 15These algorithms can be classified into three broad groups: path-following algorithms, region algorithms, and global algorithms. 16In path-following algorithms, phase integration is performed in a step sequence.Goldstein's branch cut algorithm is an example of a pathdependent path-following algorithm that identifies inconsistency-causing structures called residues and connects and balances them. 17Region algorithms divide the data field into separate regions, which are individually processed.The regions are then unwrapped with respect to each other to process the entire image. 18Flynn's mask cut algorithm is an example of a region algorithm. 15,19Global algorithms formulate phase unwrapping in terms of minimization of a global function. 15,20More recent alternative algorithms have built on or modified these classic methods. 18,21owever, imaging biological samples presents a unique challenge because of the varying morphology of cells during cell culture, combined with the fact that the optical path length through cells relative to background levels routinely exceeds 2π rad.Due to the varying morphology of cells and the desire for high throughput measurements, these large phase differences can occasionally occur over distances spanning a single pixel in the resulting image.This is especially problematic for nonadherent mammalian cells or for adherent mammalian cells during mitosis, which typically have a large jump in phase from the edge of the cell to the surrounding background.These edges are liable to be phase unwrapped incorrectly by standard algorithms [Fig.1(a)].Additionally, to avoid phototoxicity when imaging live cells, 22 a low light intensity is preferable, which results in poor phase quality and high relative noise levels, 23 further complicating phase unwrapping.
There are a large variety of methods that have been proposed to address this problem for two-dimensional images of biological samples.Some methods bypass the unwrapping process itself through the modification of instrumentation, including heterodyne mixing, 24 optical path locking, 25 and the use of the transport of intensity equation. 26,27However, for many of the available QPI methods, phase unwrapping remains a general problem of interest.There are several alternative algorithms that modify the classic phase unwrapping methods or use specially developed software to improve the speed and throughput of phase unwrapping for QPI of biological specimens. 18,28However, these methods are largely based on the standard algorithms and may not adequately address all phase unwrapping problems faced by these methods.
Here, we present a new phase unwrapping method for correcting phase ambiguities in QPI of biological samples where the standard algorithms have failed.This approach, a hybrid random walk-linear discriminant (HRL) method, is intended to be applied to phase-shifting interferometry (PSI) data after first applying the Goldstein algorithm. 15,17The HRL approach combines two techniques: linear discriminant analysis (LDA), a widely used scheme for dimension reduction and feature extraction, and a random walker (RW) segmentation algorithm.LDA projects data onto a lower dimensional vector space such that the ratio of between-class difference to the within-class difference is maximized, achieving maximum discrimination between classes. 29In this application, it is used to emphasize the edges of phase-wrapped regions.In RW image segmentation, a seed mask specifies locations in the image with a predetermined label.Unseeded pixels are labeled using the probabilities that a RW agent starting at that location terminates at each label in the seed mask.The random walk can be biased (for example, to avoid sharp intensity gradients), and the final segmentation is derived by determining the most probable seed destination for each pixel. 30Here, the LDA class probability image is used as the weighting function for the RW algorithm to efficiently segment phase-wrapped regions.
The HRL algorithm combines the advantages of these two techniques based on two assumptions about QPI data from biological samples: (1) cell optical path length is never less than zero (n cell > n media ) and (2) uncorrected regions are predominantly 2π rad lower than the correct level, as is commonly observed in our QPI datasets of biological samples.This second assumption means that the phase unwrapping problem is reduced to a matter of classifying pixels as belonging to properly phase unwrapped segments ("good" regions) or improperly unwrapped segments ("bad" regions) and applying a one-wavelength correction to pixels in the "bad" regions.To achieve this, manually phase-unwrapped training data is used to train an LDA classifier to generate an image that highlights the edges of phase-wrapped regions.Then, a RW segmentation method is applied to this image, starting at known-good or known-bad points based on the assumption that the cell optical path length is never less than zero.This segmentation approach is highly efficient because the gradient of the LDA image is large at the boundaries of the phase-wrapped regions, and the RW method is, therefore, strongly biased to not cross these boundaries.As a final step, a genetic algorithm (GA) is applied to fine-tune the LDA coefficients.The result is phase unwrapping that is highly accurate relative to manually unwrapped results [area under curve ðAUCÞ ¼ 0.999] for effective phase unwrapping of biological phase interferometry data.

Quantitative Phase Imaging Data
Phase, intensity, and phase modulation magnitude 31 data were acquired on a Contour GT-X8 (Bruker) interference microscope in PSI mode.The Bruker system was modified to accommodate a custom live cell incubation system and a 20×, 0.28 numerical aperture microscope objective with attached Michelson interferometer.The Michelson interferometer consisted of a beam splitter, reference mirror, and compensation reference chamber filled with deionized water.Illumination was provided by a 530-nm fiber-coupled LED (Thorlabs).Mouse L-cell fibroblasts and M202 human melanoma cells 32 were cultured as described previously 7,10,33 in phenol-red free Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum (Omega Scientific), 1× nonessential amino acid solution (Invitrogen), 2 mM glutamine (Invitrogen) and antibiotics.
Raw interferograms were unwrapped using the Goldstein method, 17 which unwraps images by isolating and avoiding local-error-causing residues, 15 prior to the manual or automated correction methods described here.

Manual Phase Unwrapping
Manual phase unwrapping was performed on a set of 50 phase images of mouse L-cells and 50 phase images of M202 melanoma cells to serve as training datasets and 50 additional images of each cell type as test sets for evaluation of our final automatic phase unwrapping method.Regions to be corrected were selected by looking for cellular areas with sharp internal edges, indicating a poorly placed branch cut in the Goldsteinunwrapping process, or regions with phase shifts below the background level, which would indicate nonphysical refractive indices less than that of the cell culture media.A custom graphical user interface was written in MATLAB (Mathworks) with three main steps: (1) select an integer multiple of 2π rad correction to be applied, (2) select image regions to be corrected using the imfreehand function in the image processing toolbox, and (3) manually correct the phase unwrapping result at individual points.This process was repeated until all phase-wrapped regions were corrected.) a 3 × 3 mean filtered image, (10) a 3 × 3 median filtered image, (11) the phase derivative variance (PDV) of the image, defined as the variance of the partial derivative of the phase in a four-connected neighborhood at each pixel and computed using the method described in the phase quality guided path following method 15 and implemented in MATLAB by Spottiswoode, 35  (12) the natural logarithm of the PDV image to provide data with comparable dynamic range to the other measures, (13) the phase modulation magnitude, or amplitude of the intensity fluctuations in the phase-shifted fringes, computed by the Bruker optical profilometer, 31  (14) the natural logarithm of the phase modulation image, (15) the Laplacian filtered phase modulation image, (16) the intensity image, and ( 17) the Laplacian filtered intensity image.

Linear Discriminant Analysis
LDA class probabilities were calculated using the Fisher linear discriminant 36 with prior probabilities equal to the observed sample probabilities and all code implemented in MATLAB. 37he LDA-only phase unwrapping algorithm used LDA class probabilities as the input to a marker-based watershed algorithm 34 to define phase-wrapped image regions.

Random Walker Algorithm
The RW algorithms assign each pixel a probability that a RW agent released from that pixel will reach a predetermined pixel of a given label (e.g., known-good or known-bad).Pixel labels were assigned by first subtracting the background tilt from raw phase images.Then pixels were assigned a "bad" label if they were more than a threshold value below the background level of zero, or if they were on the low side of an edge with a first derivative greater than a jump threshold (computed in all four cardinal directions by first-order discretization).Pixels were assigned a "good" label if they were smaller than the mean phase value in their 100 × 100 pixel neighborhood or if they were on the high side of an edge with a first derivative greater than the jump threshold.
The faster random walk method was based on the algorithm presented and implemented in MATLAB by Grady. 30Briefly, this method bypasses a Monte Carlo-type simulation of individual RW agents by solving for a solution to the Dirichlet problem. 30,38,39Additionally, this method can be biased to obtain higher quality segmentation by adjusting the probability that a RW agent will move in a given direction based on the difference in image intensity between the two corresponding pixels.To bias this method for more accurate segmentation, we tested the phase image, the LDA class probability image, and the natural logarithm of the LDA class probability image, with the natural logarithm of the LDA class probability image giving superior results.

Genetic Algorithm
LDA images define the probability that a given pixel lies on the edge of a phase-wrapped region, but does not compute the optimal image statistic weighting required for follow-on processing to turn those probability images into correctly phase-unwrapped data.To optimize LDA coefficients for phase unwrapping by watershed transform or the RW algorithm, we searched for an optimal set of coefficients to the LDA terms using the GA from the global optimization toolbox in MATLAB.The GA is an iterative process that repeatedly modifies a population of individual solutions.Random individuals are selected at each step and used to produce the next iteration.Over successive generations, the optimal solution is reached [Fig.1(g)].We used a genetic optimization with a tolerance of 1 × 10 −10 , a migration fraction of 0.3 and an initial range of −10 to 10 for all LDA term coefficients.For RW methods, we also modified the background threshold parameter with an initial range of 0.06 to 1.57 rad.

Linear Discriminant Analysis Segmentation
We manually unwrapped a randomly chosen set of 50 images of mouse L-cells and a set of 50 images of M202 human melanoma cells, which contained phase unwrapping errors after phase unwrapping via the Goldstein method.In the mouse L-cell image set, 1.2% of pixels in this set were phase wrapped, and the vast majority of these errors (99.7%) were 2π rad lower than the expected phase value, with the remaining 0.3% of phase errors being 2π rad too high.We chose to focus solely on the 99.7% of phase errors which were 2π rad below the expected value.Images of M202 melanoma cells contained far fewer phase unwrapping errors after phase unwrapping via the Goldstein method, with only 0.06% of pixels being phase wrapped, and >99.9% being 2π rad below the expected value.This makes the phase unwrapping problem for these images a two-class classification problem, which is readily amenable to the use of statistical classifier techniques. 36e first segmented these images using LDA based on the manual classification results and 17 pixelwise statistics for each image, 12 based on the raw phase image itself, 3 based on the phase quality magnitude image, and 2 based on the intensity image [Figs.1(a)-1(e)].Most image statistics are based on linear image filters chosen to highlight the edges of cells and phase-wrapped regions in the raw phase image.We used the natural logarithm of the PDV and phase quality magnitude to provide an image that more closely matched the dynamic range of the others in the set.
We performed LDA to classify pixels into two classes: (1) pixels lying on the edge of phase-wrapped regions and (2) correctly unwrapped pixels and pixels in the interior of phase-wrapped regions.This gave a superior performance relative to LDA for separation of phase-wrapped interiors from correctly unwrapped regions, likely due to the presence of strong features at the edges of phase wrapped regions and because the interiors of phase-wrapped regions resemble correctly unwrapped pixels.LDA provides a set of coefficients to the 17 input images that maximizes the ratio of between-class difference to the withinclass difference. 29The final LDA image resulting from this analysis is, therefore, a linear combination of all 17 input images.We observed a fair agreement between manually marked edges and those with an LDA-determined probability of being in the edge class, p edge , of >0.5 [Fig.1(f)].
Phase unwrapping, however, requires determination of the interior, wrapped pixels as well as the edges, and most LDAdetermined edges did not define a continuous boundary.To complete the LDA boundaries, we used a marked watershed algorithm to segment the LDA class probability images.We performed genetic optimization of the input LDA coefficients, which improved the performance moderately and defines the receiver operator characteristic (ROC) with AUC ¼ 0.923 [Fig.1(g)].

Random Walker Segmentation
For most cells imaged in aqueous media n cell > n media (for example, Ref. 40) and so the phase shift relative to the background will be greater than zero.This assumption, plus the observation that, in our manually corrected datasets, most residual phase unwrapping errors involve phase being one wavelength (2π rad) too low, enable us to define boundaries of good and bad regions in the partially wrapped phase data [Figs.2(a) and 2(b)].We used a method based on simulating a random walk to use these assumptions to automatically mask phase-wrapped regions.
Two methods were used to perform the random walk.In the first, a RW agent method, 20 RW agents were released from each pixel with initially unknown status.Each agent then performed a random walk until it hit a known-good or known-bad pixel [Figs.2(a) and 2(b)].The status of the initially unknown pixel was determined by the ratio of "bad" paths to total paths, with a threshold of 0.3 to 0.9 typically used to phaseunwrap images.In the five example paths shown in Figs.2(a) and 2(b), four out of five RW agent paths terminate at a knownbad pixel, thus the probability that the start pixel is phase wrapped and this pixel would be classified as "bad" is 4∕5 ¼ 0.8.This method, however, is slow, typically requiring 40 to 90 s∕image with 20 RW agents on a single 2.4 GHz Intel Xeon core and potentially inaccurate as it relies on a limited number of random samples.
A faster and more accurate method is to use the random walk image segmentation method presented by Grady. 30This method uses the fact that the RW problem is exactly the solution to the Dirichlet problem with boundary conditions at the initially known pixels to solve the RW problem with a single matrix equation solution.This method can also be modified with weights at each pixel based on the intensity of an input image to effectively separate regions with very different intensity values. 30Including preclassification of known-good and knownbad regions, this approach requires ∼1.5 s∕image on a single 2.4-GHz Intel Xeon core relative to 40 to 90 s∕image with the RW agent approach.
The main controlling parameter for RW image segmentation is the threshold for how far below the background a pixel has to be to be considered a "bad" pixel [Fig.2(c)].If this value is set too low, the false positive rate increases rapidly.Best results (minimized misclassification errors) were achieved with a background threshold of 0.16 rad.The other parameter is the jump threshold, which determines how much of a jump from one pixel to the next is considered a phase unwrapping artifact [Fig.2(d)].This value was set to π rad, but moderately larger or smaller values do not have a large effect on the number of false positive errors.

Linear Discriminant Analysis + Random Walker
The Grady RW algorithm can be "biased" to follow natural edges in an image, with edge weights based on a Gaussian weighting function. 30We tested three inputs to the RW weighting function: (1) no input (unbiased random walk), (2) the raw phase image, and (3) the LDA class probability image.The edges of phase wrapped regions in the LDA image are highlighted much more clearly than in the raw phase image (Fig. 3), indicating that LDA class probabilities may serve as effective weights for image segmentation.We find that the raw phase image actually results in worse segmentation than the unbiased RW algorithm (Fig. 4) with an AUC of 0.945 for the ROC when the background threshold is varied, compared to 0.947 for the unbiased RW.As expected, the use of the LDA class probability image (Fig. 3) to bias the RW algorithm results in a significant improvement, with an AUC of 0.984.
The final refinement performed was to run a genetic optimization on the LDA coefficients and background threshold for the HRL method.This resulted in the best overall AUC (0.999) for the training data set, achieving a very high true positive rate with a very low false positive rate.The optimal operating point based on these results, which minimizes the total misclassification error, achieves a true positive rate of 0.965, a false positive rate of 0.006, and an overall misclassification rate of 0.6% on the training dataset.

Final Results
We applied the optimized HRL method to a set of 50 manually unwrapped test images separate from those used for classifier training (Fig. 5).The results show good overall correction of phase errors [Figs.5(a) and 5(b)].The RW algorithm returns the probability that a given pixel is phase wrapped [or, as in Fig. 5(c) the complement of this, the probability that a pixel is not phase wrapped].This conforms well with the expected manual results with a threshold probability of 0.5.Results for all images in the manually corrected training and test datasets show good overall performance with the optimally chosen parameters from the GA, including background threshold ¼ 0.16 rad.The test data results compare well with the results on the training dataset, with a true positive rate equal to 0.956, a false positive rate equal to 0.005, and an overall misclassification rate of 0.6%.
Finally, we applied the HRL method to a separate data set consisting of QPI data of M202 human melanoma cells.As shown in Fig. 6, performance with this cell type was comparable to performance on the L-cell fibroblast dataset, with AUC ¼ 0.998 after genetic optimization on a training data set of 50 images.Performance on the test dataset of an additional 50 images was also high, with a true positive rate of 0.947, a false positive rate of 0.014, and an overall misclassification rate of 1.4%.

Discussion
We have demonstrated that the HRL method after genetic optimization has the best phase unwrapping performance with a high final AUC of 0.999 (Fig. 4).The overall low error rate was reproduced well on a second test data set (Fig. 5) and on a second cell type (Fig. 6), showing that the HRL algorithm is a consistent, reliable method for phase unwrapping of biological QPI data.This method is primarily useful for biological samples on flat substrates, but the general approach would be applicable for any phase sample where the standard algorithms have failed and pixels can be automatically labeled based on a set of image features.Future work will examine how the HRL algorithm could be made more efficient by utilizing other classifiers and/or reducing the statistic set used for the LDA classifier to further reduce the computational time.
The general procedure of the HRL method can be adapted to phase unwrapping of other datasets.The general method is to (1) choose image statistics to use in the LDA, giving preference to those that will emphasize the edges of phase-wrapped regions (e.g., phase quality, 41 fringe modulation, 31 or edge detection filters 34 ), (2) perform manual phase unwrapping for use as a training dataset, (3) perform LDA to separate edges of marked regions from all other pixels, (4) use image features as premarked inputs to RW segmentation, 30 biased by the weights output by LDA, and (5) as an optional last step, tune LDA coefficients, for example, using a genetic search algorithm.We also note that different components of the HRL algorithm described here can be repurposed for other data processing tasks.For example, the combination of LDA plus a biased random walk would be an effective general method to segment images automatically, for example, to segment cells from background as an alternative to the widely used watershed algorithm. 34In this case, the LDA probability image would be used to combine multiple image features to enhance the precision of finding object or cell edges, eliminating the necessity for timeconsuming manual image segmentation.As in the present study, a GA could then be used to further refine the classifier performance for the system of interest.Therefore, we expect that the HRL approach presented here is generally applicable to a variety of image classification problems, beyond phase unwrapping of biological QPI data.

Fig. 1
Fig. 1 Linear discriminant analysis (LDA)-based segmentation for phase-unwrapping of biological sample quantitative phase imaging data.(a-e) selected inputs to LDA method (a) raw (unwrapped) phase data, (b) intensity image, (c) phase modulation magnitude, (d) natural logarithm of the phase derivative variance, (e) Laplace-filtered phase image, and (f) predicted edges from LDA composite image compared to manual phase unwrapping results.In (f), green shows the manually selected mask edge, red is the LDA detected edge, and yellow shows overlap of both edges.(g) True positive rate versus false positive rate for LDA-based image segmentation during successive optimizations by a genetic algorithm (GA).The ratio of false positives to true positives generally decreases with increased number of generations with some variation inherent to genetic optimization algorithms.

Fig. 2
Fig. 2 Random walker (RW)-based image segmentation for phase unwrapping.(a and b) marked phase image of mouse fibroblasts with sample unbiased RW agent paths.Green shows automatically determined known-good regions (background or falling edge), red shows automatically determined knownbad regions (phase level < background threshold, or rising edge).Green line shows a RW path originating from the center, gray marked pixel which terminates at a known-good pixel.Red lines show paths terminating as a known-bad pixel.Based on the five shown paths, the pixel in question would have a probability of 4∕5 of belonging to the "bad" (phase-wrapped) group.(c) Algorithm performance as a function of background threshold with jump threshold fixed to π rad [based on operating point from panel (d)], used to determine when a pixel is too far below the background level to represent data from a biological sample.(d) Algorithm performance as a function of jump threshold with background threshold fixed to 0.16 rad [based on operating point from panel (c)], used to determine when a given edge is rising or falling.

Fig. 3
Fig. 3 Combination of LDA and RW methods.(a) Phase image.(b) LDA image.(c) LDA image with superimposed, automatically determined known-good (green) and known-bad (red) regions.

Fig. 4
Fig.4Receiver-operator characteristics for phase unwrapping methods.The application of a GA to optimize the hybrid RW þ LDA method gives the best results, with an AUC of 0.999.AUC GA þ LDA ¼ 0.923, RWþphase¼0.945, unbiased RW ¼ 0.947, RWþLDAðHRLÞ¼0.984,GA of RW þ LDA ðHRLÞ ¼ 0.999.The chosen operating point, which minimizes total error rate, is shown in black.

Fig. 5
Fig. 5 Final results of hybrid random walk-linear (HRL) algorithm.(a) Sample original phase image and (b) final HRL output image.(c) Pixelwise probability, p, of an LDA-biased RW agent reaching a knowngood pixel (i.e., probability that a given pixel is not phase-wrapped).(d and e) sample original and final HRL output images showing low false positive rate when there are few errors in the input image.Inset in (e) shows the same pixelwise probability, p, presented in panel (c).(f) Error rates for all 50 individual images in the manually phase-unwrapped training dataset used to train the classifier and for 50 individual images used to test the classifier performance.GA results for aggregate of all training data set images are shown for reference.Final overall results for both test and training datasets are each marked with an x .Scale in (b), (d), and (e) is the same as shown in (a).

Fig. 6
Fig. 6 Results of HRL algorithm applied to M202 human melanoma cell training and test data sets.(a) Sample original phase image of M202 melanoma cells.(b) Final HRL output image.Scale in (b) is the same as shown in (a).(c) GA results for aggregate of all training data set images (orange) with error rates for all 50 individual images used to test the classifier performance.GA results for M202 melanoma cells show an AUC of 0.998.Final overall results for both GA optimization on the training dataset and the HRL method applied to the test dataset are each marked.