GPU-based deep convolutional neural network for tomographic phase microscopy with l 1 fitting and regularization

. Tomographic phase microscopy (TPM) is a unique imaging modality to measure the three-dimensional refractive index distribution of transparent and semitransparent samples. However, the requirement of the dense sampling in a large range of incident angles restricts its temporal resolution and prevents its application in dynamic scenes. Here, we propose a graphics processing unit-based implementation of a deep convolutional neural network to improve the performance of phase tomography, especially with much fewer incident angles. As a loss function for the regularized TPM, the l 1 -norm sparsity constraint is introduced for both data-fidelity term and gradient-domain regularizer in the multislice beam propagation model. We compare our method with several state-of-the-art algorithms and obtain at least 14 dB improvement in signal-to-noise ratio. Experimental results on HeLa cells are also shown with different levels of data reduction. © The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JBO.23.6.066003]


Introduction
Most biological samples such as live cells have low contrast in intensity but exhibit strong phase contrast.Phase contrast microscopy is then widely applied in various biomedical imaging applications. 15][6] The label-free and noninvasive character makes it attractive in biomedical imaging, especially for cultured cells. 7,8owever, most of the current methods require around 50 quantitative phase images acquired at different angles [9][10][11] or different depths 6 for optical tomography.This speed limitation greatly restricts its field of applications.For example, the difference of the refractive index may be blurred during the angular (or axial) scanning when observing fast-evolving cell dynamics or implementing high-throughput imaging cytometry. 11Another challenge for TPM is the missing cone problem, which limits its reconstruction performance, especially for limited axial resolution compared with the subnanometer optical-path-length sensitivity. 12o relieve the missing cone problem, many methods have been developed for better signal-to-noise ratio (SNR) with fewer images.Different regularizations such as the positivity of the refractive index differences 4,13 and the sparsity in some transform domain 14,15 are added to an iterative reconstruction framework based on the theory of diffraction tomography, 16,17 for reducing the artifacts induced by the missing cone problem and the limited sampling rates in Fourier domain.Both intensity-coded and phase-coded structured illumination methods further promote the performance by their better multiplexing ability compared with conventional plane-wave illumination. 18,19However, these methods suffer from the great degradation when the scattering effects become significant in the sample.The beam propagation method (BPM) 20 is then applied in phase tomography to provide a more accurate model by considering the nonlinear light propagation with scattering. 21,22And the multislice propagation modeling is definitely similar to the neural network in the field of machine learning. 23,24By combining the nonlinear modeling and the sparse constraint in the gradient domain, the Psaltis group has validated the competitive capability of this learning approach over conventional methods. 21,23Despite its success in modeling with l 2 -norm constraint, the current method is still a preliminary network, especially compared with the state-of-the-art deep learning frameworks, 25 and the iterative reconstruction is challenging to deploy in practice due to the high computational cost and the difficulty of the hyperparameter selection.More potential can be exploited in both optimization algorithms and better network architectures.
In this paper, we propose a graphics processing unit (GPU)based implementation of a deep convolutional neural network (CNN) to simulate the multislice beam propagation for TPM.A loss function consisting of an l 1 -norm data-fidelity term and an l 1 -norm gradient-domain regularizer is devised to achieve higher reconstruction quality even with fewer training data.To deal with the vast quantities of parameters and regularizers, we apply the adaptive moment estimation (Adam) algorithm 26 for optimization, which can also be regarded as the training process of the CNN.Compared with previous works using stochastic gradient descent, 23,24 our method ensures a faster convergence and a better robustness to the initial value.Both simulation and experimental results on polystyrene beads, and HeLa cells are shown to validate its reconstruction performance.We anticipate that our work can not only boost the performance of optical tomography, but also guide more applications of deep learning in the optics field.

Experimental Setup
Figure 1 shows the schematic diagram of the experimental setup.In our system, 23 the sample placed between two cover glasses is illuminated sequentially at multiple angles and the scattered light is holographically recorded.A laser beam (λ ¼ 561 nm) is split into sample and reference arms by the first beam splitter.In the sample arm, a galvo mirror varies the angle of illumination on the sample using the 4F system created by L1 and OB1.The light transmitted through the sample is imaged onto the CMOS camera via the 4F system created by OB2 and L2.The beam splitter (BS2) recombines the sample and reference laser beams, forming a hologram at the image plane.The numerical apertures (NAs) of OB1 and OB2 are 1.45 and 1.4, respectively.For data acquisition, we capture multiple tomographic phase images by near-plane-wave illumination (Gaussian beam) with equally spaced incident angles.We use a differential measurement between the phase on a portion of the field of view on the detector that does not include the cell and the cell itself to maintain phase stability.Accordingly, complex amplitudes extracted from the measurements constitute the training set of our proposed CNN.

Beam Propagation Method
We build the CNN, based on the forward model of light propagation, 21,23 to model the diffraction and propagation effects of light-waves.It is known that the scalar inhomogeneous Helmholtz equation completely characterizes the light field at all spatial positions in a time-independent form E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 3 3 4 ½∇ 2 þ k 2 ðrÞuðrÞ ¼ 0; (1) where r ¼ ðx; y; zÞ denotes a spatial position, u is the total lightfield at r, is the Laplacian, and kðrÞ is the wave number of the light field at r.The wave number depends on the local refractive index distribution nðrÞ as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 7 0 4 where k 0 ¼ 2π∕λ is the wave number in vacuum, n 0 is the refractive index of the medium, and the local variation δnðrÞ is caused by the sample inhomogeneities.By introducing the complex envelope aðrÞ of the paraxial wave uðrÞ ¼ aðrÞ expðjk 0 n 0 zÞ for BPM, we can obtain an evolution equation 21 in which z plays the role of evolution parameter ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 6 0 7 where δz is a sufficiently small but a finite z step, ω x and ω y represent angular frequency coordinates in the Fourier domain, að•; •; zÞ expresses the two-dimensional (2-D) complex envelope at z depth, Ã refers to a convolution operator, and F −1 f•g means the 2-D inverse Fourier transform.

GPU-Based Implementation of CNN
A schematic architecture of our CNN is shown in Fig. 2. For constructing our neural network, we divide the computational sample space into thin slices with the sampling interval δz along the propagation direction z.One slice corresponds to one layer in CNN.Within each layer, neurons specify the discretized light-field with transverse sampling intervals δx and δy, respectively.The input layer is the incident field upon the sample.In terms of the Eq. ( 3), inputs are then passed from nodes of each layer to the next, with adjacent layers connected by alternating operations of convolution and multiplication.At the very last layer of our CNN, the output complex field amplitude is then bandlimited by the NA of the imaging system composed of lenses OB2 and L2 in Fig. 1.We implement the proposed network on the basis of TensorFlow framework.The connection weight δnðrÞ can be trained using the Adam algorithm for optimization on the following minimization problem: where M denotes the number of measured views, k •k 1 indicates the l 1 -norm, and ∇ ¼ ð ∂ ∂x ; ∂ ∂y ; ∂ ∂z Þ is the differential operator.For a given view m, Y m , and G m are the output of the last layer and the actual measurement acquired by the optical system, respectively.The design of our loss function will be specifically discussed in Sec.4.1.Compared with the l 2 -norm, the l 1 datafidelity term relaxes the intrinsic assumptions on the distribution of noise (symmetry and no heavy tails) and suits better for the measurements containing outliers.Hence, it can be effectively applied to the biomedical imaging especially when the noise model is heavy-tailed and undetermined. 27As a regularization term, RðδnÞ imposes the l 1 -norm sparsity constraint on a gradient domain according to its better characteristic for the reconstruction from higher incomplete frequency information than l 2 -norm, 28,29 whereas τ is the positive parameter controlling the influence of regularization.The positivity constraint takes advantage of the assumption that the index perturbation is real and positive when imaging weakly absorbing samples such as biological cells.The subgradient method 30 plays an important role in machine learning for solving the optimization framework under l 1 -norm and the Ref. 26 has verified the theoretical convergence properties of the Adam algorithm, which will be specifically discussed in Sec.4.3.We perform the neural network computations on 4 NVIDIA TITAN Xp graphics cards and the processing time to run the learning algorithm (100 iterations) on 256 × 256 × 160 nodes is nearly 9 min.Obviously, it is possible to make the optimization of hyperparameters, which have an important effect on results, a more reproducible and automated process and thus is beneficial for training the large-scale and often deep multilayer neural networks successfully and efficiently. 31The full implementation and the trained networks are available at https://github.com/HuiQiaoLightning/CNNforTPM.

Results
For demonstration, we evaluate the designed network by both simulation and experimental results of the TPM as described before.To make a reasonable comparison, selected hyperparameters have been declared for all the other reconstruction methods.The selection of hyperparameters will be specifically discussed in Sec.4.2.

Tomographic Reconstruction of Simulated Data
In simulation, we consider a situation of three 5 μm beads of refractive index n ¼ 1.548 immersed into oil of refractive index n 0 ¼ 1.518 shown in Fig. 3.The centers of the beads are placed at ð0; 0; −3Þ, (0, 0, 3), and (0, 5, 0), respectively, with the unit of micron.The training set of the framework is simulated as 81 complex amplitudes extracted from the digital-holography measurements with different angles of incidence evenly distributed in ½−π∕4; π∕4 by BPM, whereas the illumination is tilted perpendicular to the x-axis and the angle is specified with respect to the optical axis z.The size of the reconstructed volume is Þ and we take the δnðrÞ of a polystyrene bead as example.

Cross-sectional views
Fig. 3 Simulation geometry comprising three spherical beads with a refractive index difference of 0.03 compared with the background.
with the sampling steps of δx ¼ δy ¼ δz ¼ 144 nm.For the network hyperparameters, we choose 600 training iterations in our GPU-based implementation with the batch size of 20, the initial learning rate of 0.001, and the regularization coefficient of τ ¼ 1.5.The reconstructed results by our method and other reconstruction methods are shown in Fig. 4. The SNR defined in Ref. 21 of our result is 25.56 dB, 14 dB higher than the previous works.We can also observe much sharper edges of the reconstructed beads at the interface with less noise in the background from Fig. 5.The comparison between the proposed loss function and other regularized loss functions proves the higher reconstruction quality of the l 1 -norm constraint than the l 2 -case directly.
In addition, we analyze the performance of our method under different noise levels and reduced sampling angles.For the noise test, we add Gaussian noise of different power levels to the 81 simulated measured complex amplitudes, which are represented as different SNRs of the training data.From the curve of the reconstructed SNR versus the noise level, as shown in Fig. 6(a), we can find our method maintains more robustness to the noise than other methods.This is especially useful in the case of shorter exposure time for higher scanning speed, where the data are always readout-noise limited.For the test of reduced sampling angles, we keep the range of incident angles fixed from −π∕4 to π∕4.The total number of the incident angles for the network training decreases from 81.The curve of the reconstructed SNR versus the number of the incident angles is shown in Fig. 6(b).Even with as few as 11 incident angles, we can still achieve comparable performance as the previous methods with 81 angles.This nearly eight-time improvement facilitates the development of high-speed 3-D refractive index imaging.

Tomographic Reconstruction of a Biological Sample
To further validate the capability of the network, we display the experimental results on HeLa cells performed by our tomographic phase microscope as shown in Regularized loss function comprises one data-fidelity term and one regularization term.
To the best of our knowledge, the presented study is the first to employ l 1 fitting for the regularized TPM.Generally, the choice of the data-fidelity term depends on the specified noise distribution.However, it is particularly common for solving normal image restoration problems, as under various constraints, images are always degraded with mixed noise and it is impossible to identify what type of noise is involved.The l 2 -norm fitting relies on strong assumptions on the distribution of noise: there are no heavy tails and the distribution is symmetric.If either of these assumptions fails, then the use of l 2 -norm is not an optimal choice.On the other hand, for the so-called robust formulation based on l 1 -norm fitting, it has been shown that the corresponding statistics can tolerate up to 50% false observations and other inconsistencies. 27Hence, l 1 -norm data-fidelity term relaxes the underlying requirements for the l 2 -case and is well suited to biomedical imaging especially when the noise model is undetermined [as shown in Figs.4(a)-4(d)] and mixed [as shown in Fig. 6(a)].
As for the regularization term, we finally choose the anisotropic total variation (TV) regularizer in our method, which is an l 1 penalty directly on the image gradient.It is a very strong regularizer, which offers improvements on reconstruction quality to a great extent compared with the isotropic counterpart (l 2 penalty). 29Therefore, the edges are better preserved, which can be seen apparently from the comparison between Figs. 4(a) and 4(b).

Selection of Hyperparameters
Selection of hyperparameters has an important effect on tomographic reconstruction results.In practice, many learning algorithms involve hyperparameters (10 or more), such as initial learning rate, minibatch size, and regularization coefficient.Reference 31 introduces a large number of recommendations for training feed-forward neural networks and choosing the multiple hyperparameters, which can make a substantial difference (in terms of speed, ease of implementation, and accuracy) when it comes to putting algorithms to work on real problems.Unfortunately, optimal selection of hyperparameters is challenging due to the high computational cost when using traditional regularized iterative algorithms. 21,23n this study, our GPU-based implementation of CNN runs computation-intensive simulations at low cost and is possible to make the optimization of hyperparameters a more reproducible and automated process with modern computing facilities.Thus, we can gain better and more robust reconstruction performance with the GPU-based learning method.During the simulation and experiment, selection of hyperparameters varies with the biological sample and the range of incident angles.To make a convincing comparison, optimal hyperparameters have been selected for all the other reconstruction methods.The refractive index difference δnðrÞ is initialized with a constant value of 0 for all the methods, and different optimal regularization coefficients are chosen for different regularized loss functions due to the different combinations of data-fidelity term and regularization term.The number of iterations is set to guarantee the convergence of each method, as shown in Fig. 9.

Subgradient Method and Adam Algorithm
In convex analysis, 30 the subgradient generalizes the derivative to functions that are not differentiable.A vector g ∈ R n is a subgradient of a convex function at x if E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 6 3 ; 1 1 0 fðyÞ ≥ fðxÞ þ g T ðy − xÞ ∀y: (5) If f is convex and differentiable, then its gradient at x is a subgradient.But a subgradient can exist even when f is not differentiable at x.There can be more than one subgradient of a function f at a point x.The set of all subgradients at x is called the subdifferential, and is denoted by ∂fðxÞ.Considering the absolute value function jxj, the subdifferential is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 5 2 8 ∂jxj ¼ 8 < : Subgradient methods are subgradient-based iterative methods for solving nondifferentiable convex minimization problems.
Adam is an algorithm for first-order (sub)gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.The method is aimed toward machine learning problems with large datasets and/or highdimensional parameter spaces.The method is also appropriate for nonstationary objectives and problems with very noisy and/ or sparse gradients.Adam works well in practice and compares favorably with other stochastic optimization methods regarding the computational performance and convergence rate. 26It is straightforward to implement, is computationally efficient, and has little memory requirements, which is robust and well suited to TPM.Compared with the stochastic proximal gradient descent (SPGD) algorithm reported in Ref.21, our GPU-based CNN trained with the Adam algorithm for optimization converges to the same SNR level and achieves twice the rate of convergence as shown in Fig. 10.To show the higher convergence rate of Adam fairly, we use the proposed l 1 -norm loss function for both the Adam and SPGD training processes here, thus producing the same reconstructed SNR after convergence.

Conclusion
We have demonstrated a GPU-based implementation of deep CNN to model the propagation of light in inhomogeneous sample for TPM and have applied it to both synthetic and biological samples.The experimental results verify its superior reconstruction performance over other tomographic reconstruction methods, especially when we take fewer measurements.Furthermore, our CNN is much more general under different optical systems and arbitrary illumination patterns as its design is illuminationindependent.Importantly, this approach can not only enlarge the applications of optical tomography in biomedical imaging, but also open rich perspectives for the potential of deep neural networks in the optical society.

Fig. 2
Fig. 2 Detailed schematic of our CNN architecture, indicating the number of layers (Nz), nodes (Nx × Ny) in each layer and operations between adjacent layers.Here, kerðω x ; ω y Þ signifies

Fig. 1 .Fig. 4 Fig. 5 4 Discussion 4 . 1
Fig. 4 Reconstruction results of three 5 μm beads.Comparison of the cross-sectional slices of the 3-D refractive index distribution of the sample along the x − y , x − z, and y − z planes reconstructed by (a) proposed CNN, (b) CNN with l 1 fitting, l 2 regularization (L1 L2) and the regularization coefficient of 5, (c) CNN with l 2 fitting, l 1 regularization (L2 L1) and the regularization coefficient of 0.1, (d) learning approach23 implemented on the same CNN settings (LA) with the regularization coefficient of 0.6, (e) optical diffraction tomography based on the Rytov approximation (ODT)13 with the positivity constraint and 100 iterations, and (f) iterative reconstruction based on the filtered backprojection method4 with the positivity constraint and 400 iterations.Scale bar, 5 μm.

Fig. 6 Fig. 7 Fig. 8
Fig. 6 Performance analysis for proposed approach with the same hyperparameter selection.(a) The curve of the reconstructed SNR versus the noise level and (b) the curve of the reconstructed SNR versus the number of the incident angles.

Fig. 9 Fig. 10
Fig. 9 Reconstructed SNR plotted as a function of the number of iterations for different reconstruction methods on simulated data.Hyperparameters are declared in Sec.3.1.