Complex-domain-enhancing neural network for large-scale coherent imaging

Abstract. Large-scale computational imaging can provide remarkable space-bandwidth product that is beyond the limit of optical systems. In coherent imaging (CI), the joint reconstruction of amplitude and phase further expands the information throughput and sheds light on label-free observation of biological samples at micro- or even nano-levels. The existing large-scale CI techniques usually require scanning/modulation multiple times to guarantee measurement diversity and long exposure time to achieve a high signal-to-noise ratio. Such cumbersome procedures restrict clinical applications for rapid and low-phototoxicity cell imaging. In this work, a complex-domain-enhancing neural network for large-scale CI termed CI-CDNet is proposed for various large-scale CI modalities with satisfactory reconstruction quality and efficiency. CI-CDNet is able to exploit the latent coupling information between amplitude and phase (such as their same features), realizing multidimensional representations of the complex wavefront. The cross-field characterization framework empowers strong generalization and robustness for various coherent modalities, allowing high-quality and efficient imaging under extremely low exposure time and few data volume. We apply CI-CDNet in various large-scale CI modalities including Kramers–Kronig-relations holography, Fourier ptychographic microscopy, and lensless coded ptychography. A series of simulations and experiments validate that CI-CDNet can reduce exposure time and data volume by more than 1 order of magnitude. We further demonstrate that the high-quality reconstruction of CI-CDNet benefits the subsequent high-level semantic analysis.


Introduction
Large-scale coherent imaging (CI) has brought about a paradigm shift in our understanding of optical imaging, from morphological manifestation to quantitative measurement. [1][2][3][4][5][6][7][8][9] The information throughput of the optical system is defined by the space-bandwidth product (SBP), which represents the number of optically resolvable spots within the field of view (FOV). 6,10 In CI, the joint reconstruction of amplitude and phase further expands SBP to billions, realizing both wide FOV and high-resolution imaging. 9,11,12 The remarkable throughput and resolving capacity provide cellular and molecular insights for biomedical research. [13][14][15] Large-scale CI techniques generally require certain types of diversity measurements in the spatial domain (e.g., lensless on-chip systems [16][17][18][19][20] or the Fourier domain (e.g., Fourier ptychography 8,9 ). Tens or hundreds of intensityonly measurements are often needed to reconstruct the sample's complex wavefront. Such high-volume data make large-scale imaging time-consuming and computationally expensive. Although reducing measurement data volume and exposure time are straightforward strategies, they would sacrifice imaging resolution and signal to noise ratio (SNR).
The image denoising technique has emerged as an effective method for improving imaging quality and confronting insufficient measurement and illumination. However, the conventional model-driven denoising techniques [21][22][23][24][25] suffer from high computational complexity, making them impractical for high-throughput CI. Recent advanced deep-learning technology introduces the data-driven strategy for image enhancement tasks, providing rapid and flexible solutions for computational imaging. 26,27 In one instance, 27 the convolutional neural network (CNN) is able to learn a mapping from noisy images to noise-free images directly, reducing several orders of magnitude of reconstruction time. Although CNN-based techniques have achieved great success in real-domain image denoising, there are several challenges for large-scale CI. First, the existing CNN architecture, training strategy, and degradation model of data sets are designed for intensity-only images. They do not consider the amplitude-phase correlations of complex-domain signals that have been widely used in neurosciences and speech signal processing fields. [28][29][30][31] Second, conventional real-domain enhancement neural networks typically rely on distinct parameters to adequately capture the characteristics of both amplitude and phase, in order to achieve satisfactory denoising performance. This trade-off between denoising performance and efficiency poses a challenge for such networks. 32,33 Third, image denoising often smooths edges and sacrifices imaging resolution, which is contradictory to the goal of superresolution coherent reconstruction. In summary, the existing large-scale CI techniques require a trade-off between imaging quality and efficiency, which restricts their clinic applications for rapid and low-phototoxicity imaging. 34 Recent advancements in complex-domain neural networks 28,30,35 have achieved significant success in complex signal processing. For example, Trabelsi et al. 30 applied it to speech signal processing, resulting in improved accuracy. Zhang et al. 35 combined complex-domain neural networks with deep unfolded techniques to achieve high-quality lensless imaging. In this work, we introduce the complex-domain neural network to enhance large-scale CI, termed as CI-CDNet. We demonstrate its wide applications for various large-scale CI modalities with remarkable quality and efficiency. CI-CDNet effectively utilizes latent coupling information, which involves the feature aliasing between amplitude and phase images, to overcome the reconstruction ambiguity associated with phase information. By doing so, it enables multidimensional representations of the complex wavefront, thereby facilitating the effective suppression of complex multisource measurement noise in computational imaging while preserving fine details and achieving high imaging resolution. In addition, CI-CDNet processes the complex wavefront in a one-step and end-to-end manner, maintaining remarkable performance and efficiency. Specifically, we derived the two-dimensional complex-domain convolution unit, the corresponding activate function, and built the comprehensive multisource noise model for CI, which includes speckle noise, Poisson noise, Gaussian noise, and superresolution reconstruction noise. We then trained CI-CDNet using the derivate multisource noise model and demonstrated it in various large-scale CI modalities, including noniterative Kramers-Kronig-relations (KKR) holography, 3,36-38 Fourier ptychographic microscopy (FPM), 8,9 and lensless coded ptychography (LCP). 19,[39][40][41] The results indicate that CI-CDNet obtains state-of-the-art performance in accuracy, computational efficiency, and imaging resolution. It is able to reduce exposure time and data volume by more than 1 order of magnitude. Finally, we further demonstrated that the high-quality reconstruction of the proposed technique benefits the subsequent high-level semantic analysis, such as cell segmentation and virtual staining.

Complex-Domain Neural Network
The architecture of the proposed CI-CDNet is presented in Fig. 1(a). The input contains a complex wavefront and a noise map. The noise map makes the denoising degree flexible in the iterative reconstruction, which is able to balance the smoothness and fidelity (Note 1 in the Supplemental Material). The backbone of CI-CDNet is a complex-domain U-Net that contains multiple residual blocks to increase the modeling capacity. Specifically, it contains four downsampling and upsampling scales with 64, 128, 256, and 512 channels, respectively. Each scale has an identity skip connection between 2 × 2 complexdomain strided convolution (CD-SConv) downsampling and 2 × 2 complex-domain transposed convolution (CD-TConv) upsampling operations. In addition, we employed successive complex-domain residual blocks, which consist of CD-Conv, CD-Relu, and CD-Conv in the downscaling and upscaling of each scale. The proposed CI-CDNet utilized the complex operation and block as the basic units. The detailed formalism of each block is shown below. Figure 1(b) shows the complex convolution operator. Assume that the complex feature map and convolution kernel are represented as F ¼ F R þ iF I and K ¼ K R þ iK I , respectively, where F R and K R are the real parts, F I and K I are the imaginary parts, and i is the imaginary unit. Then, the complex-domain convolution can be indicated as

Complex-domain convolution
where * denotes the convolution operation. The complexdomain convolution can also be presented in a matrix format as follows:

Complex-domain activation function
The activation function plays a great role in increasing the nonlinear modeling ability of a neural network. We employed the rectifier linear unit (ReLU) as the activation function and implemented it in the real and imaginary parts independently. Thus, the complex-domain activation function (CReLU) can be expressed as where the ReLU is

Complex-domain weight initialization
The complex value with a mean of zero is employed to implement weight initialization, where jWj and θ are the amplitude and phase, respectively. In our implementation, jWj follows Rayleigh distribution and θ follows uniform distribution in the range of ð−π; πÞ. The variance of the complex-domain weight is Because W is symmetrically distributed around 0, thus It is hard to compute E½jWj 2 directly. 30 We can introduce a auxiliary variable VarðjWjÞ, which can be obtained through Putting Eqs. (7) and (8) together, VarðjWjÞ can be indicated as VarðjWjÞ ¼ VarðWÞ − ðE½jWjÞ 2 . Thus, the variance of W is expressed as

Multisource Noise Model for Large-Scale CI
In general, the measurement noise is modeled as additive Gaussian noise. Although it has been validated that a CNN trained with synthesis Gaussian noise data has the capacity for removing mixed noise by setting a large noise variance, 27 the image details would be sacrificed. To break this limitation, we built a multisource CI noise model to match the real-world noise, as shown in Fig. 1(c). Specifically, we considered the following four noise types.

Gaussian noise
Additive white Gaussian noise models the generalized detector's noise, such as nonuniform illumination noise and thermal noise. We added Gaussian noise in the training data with random noise variance (from 0 to 0.3).

Poisson noise
Poisson noise models the photons' statistical characteristic, which is related to light intensity. It occurs severely in low-light and short-exposure conditions. In order to simulate different Poisson noise levels, we generated a random multiplicative coefficient 10 α (α ∈ ½2,3) to the complex-domain images. After adding Poisson noise, the images divide back by 10 α .

Speckle noise
Speckle noise usually appears in CI modalities. It is a multiplicative noise that can be modeled by Gaussian distribution. We simulated multiplicative speckle noise with the same variance range as the Gaussian noise.

Superresolution noise
Large-scale CI usually employed superresolution reconstruction techniques, for instance, ptychography imaging synthetizes spatial or Fourier domain to extend the SBP. Although the superresolution reconstruction does not introduce noise, it would magnify noise and affect its distribution. To model the superresolution noise, we utilized the bicubic interpolation 42 to resize the noisy complex-domain wavefront with a superresolution ratio of 2.
We utilized a random shuffle strategy to add the abovementioned multisource noise in the real and imaginary parts of the complex wavefront. Specifically, the additive Gaussian noise is first added due to its strong generalization for different noise sources. After that, the speckle noise and Poisson noise randomly appeared with the probability of 50%. Finally, we resized the noisy wavefront to simulate the superresolution reconstruction noise.

Training Details
We employed 10,000 synthetic data sets and added the multisource noise to train CI-CDNet (Note 1 in the Supplemental Material). We used L1 loss and Adam optimizer to update parameters with a batch size of 16. The epoch is 400 with an initial learning rate of 1 × 10 −5 ; then the learning rate is shrunk by a factor of 0.5 every 150 epochs. The training was implemented in Pytorch 1.8.1 and NVIDIA 2080ti GPU for about 4 days.

Results
We applied the proposed CI-CDNet to enhance the reconstructed wavefront and explored its potential for reducing exposure time and data volume. The comparison methods included BM3D, 24 complex-domain BM3D (CD-BM3D), 25 and conventional real-domain neural network (Real-NN). Real-NN has the same architecture and training process as the CI-CDNet. These comparison methods are state-of-the-art representations of model-driven and data-driven methods. We employed BM3D and Real-NN to denoise the amplitude and phase of the complex-domain wavefront independently. The CD-BM3D and CI-CDNet were used to denoise the complex wavefront directly.

Kramers-Kronig-Relations Holography
Wavefront reconstruction via KKR is a recent high-SBP and noniterative CI technique, 3,36-38 which has been used in both two-dimensional holographic imaging 36,38 and threedimensional refractive index tomography. 3,37 KKR combines the real and imaginary parts of a complex function that is analytic in the upper half-plane and requires multiple measurements of different illumination angles 37 or aperture modulation 38 to satisfy the analyticity. We applied these denoising methods to the KKR reconstructed wavefront, aiming to reduce exposure time and accelerate measurement acquisition.
Our experimental setup is shown in Fig. 2(a). It contained a 532 nm laser diode (Thorlabs DJ532-40) as a light source, an objective (10× Mitutoyo Plan Apo infinity-corrected objective, 0.28 NA), a reflective spatial light modulator (Holoeye LC-R 1080), and a camera (Allied Vision Prosilica GX 6600) with 5.5 μm pixel size. We employed the aperture modulation strategy to satisfy the analyticity of KKR. Specifically, the Fourier plane was relayed outside the objective onto the spatial light modulator (SLM) plane, and the edge of the generated modulation aperture strictly crosses the objective's pupil center. To obtain the complete Fourier spectrum within the pupil, we implemented four modulations and acquired corresponding intensity measurements under 1 to 1000 ms exposure time (Note 3 in the Supplemental Material). The reconstruction result of KKR using 1000 ms measurements was used as the ground truth (GT) to quantitatively compare the performance of different techniques. Figure 2(b) shows the resolution test results of Siemens star under 1 ms exposure time. Due to the short exposure time, the results of KKR recovery contained serious background noise and detail loss. Although the conventional denoising algorithms can suppress noise, the resolution was sacrificed (as presented in the cross-sectional curve). In comparison, the proposed CI-CDNet outperformed other methods in both noise suppression and resolution maintenance. Figure 2(c) shows the running time (ms) of different methods. The proposed CI-CDNet had the best running efficiency. Quantitatively, CI-CDNet reduced running time by 2 orders of magnitude compared with the conventional model-based techniques (BM3D and CD-BM3D).
Then, we employed a biological sample to quantitatively explore the performance of CI-CDNet for reducing exposure time. Figure 3(a) shows the results of papillary thyroid carcinoma slide under 1 ms exposure time. Figure 3(b) shows the quantitative results under different exposure time. The evaluation indexes included peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). We can see that the result of CI-CDNet under 1 ms exposure time is close to the result of KKR under 50 ms exposure time (more results can be seen in Note 5 in the Supplemental Material). Thus, CI-CDNet can reduce more than 1 order of magnitude in exposure time.

Fourier Ptychographic Microscopy
FPM is a novel technique for wide-field and high-resolution imaging. 8,9 It extends microscopy's bandwidth to a billion pixels   Figures 4(d) and 4(e) show the quantitative results (blood smear) of amplitude and phase, respectively. We can see that the conventional AP algorithm (baseline) failed due to serious noise and distortion. The regularization methods with real-domain denoising techniques (BM3D and Real-NN) are able to enhance the imaging resolution, which can resolve group 7, element 5 of the USAF target, but the spatial distortion seriously affected the reconstruction quality (especially the phase images of blood smear). The CD-BM3D outperformed the real-domain denoising techniques, with higher resolution to resolve group 7, element 6 of the USAF target and better phase image quality of the blood smear. However, the high computational complexity and long running time make it unsuitable for rapid largescale imaging. In comparison, the proposed CI-CDNet obtained the best performance. It can resolve group 8, element 2 of the USAF target and recover clear cell structures for both amplitude and phase images of blood smear. The PSNR and SSIM indexes also validated the advantage of the proposed CI-CDNet. Figure 4(f) presents the running time (s) of different methods. We should note that the running time of these enhancing methods included the iteration time of data-fidelity terms based on AP. Benefiting from the one-step and end-to-end strategies, CI-CDNet is efficient in the iterative reconstruction, which only consumed about a quarter of running time compared with the conventional BM3D method.

Lensless Coded Ptychography
LCP with a random diffuser has emerged as a low-cost high-SBP technique that can bypass the throughput limit of optical systems. 19,[39][40][41] In LCP, a diffuser is placed between the sample and detector to modulate the wavefront and encode the high-frequency information (Note 3 in the Supplemental Material). In general, LCP requires nearly thousands of LR measurements to iteratively recover the high-resolution sample and unknown diffuser's profile simultaneously, which makes the data acquisition time-consuming and cumbersome. Thus, we aim at reducing data volume requirements and acquisition time. Figure 5(a) shows the experiment setup. We applied the glass etching chemicals to a coverslip and coated carbon nanoparticles to produce a random diffuser. It realized micrometer-level phase scattering and subwavelength intensity absorbing. The light source was a fiber-coupled diode with 532 nm wavelength. We used an unstained blood smear as a sample and continuously moved it to 900 x − y positions. The shift step size is 1 to 3 μm to balance the motion blur and similarity. A detector (Sony IMX226, 1.85 μm pixel size) was used to capture the corresponding intensity diffraction images at a fixed frame rate (30 FPS), and the data collection consumed ∼30 s. We compared the ePIE algorithm, 44 Real-NN and CI-CDNet regularization algorithm (Note 2 in the Supplemental Material) to superresolution reconstruct (4×) the sample and the diffuser's profile using only 50 captured images. The BM3D and CD-BM3D methods failed due to their excessive computational complexity and unacceptable long running times. The results of ePIE using 900 images were regarded as the GT. The recovered complex-domain diffuser's profile and sample shift positions are shown in Fig. S4 of Note 3 in the Supplemental Material. Figures 5(b) and 5(c) show the results of amplitude and phase using CI-CDNet. The reconstructed complex-domain images have 6144 pixels × 6144 pixels and a 7.5 mm FOV. Figure 5(d) shows the close-ups of three regions of interest (ROIs). The pseudo-color part is the phase and the gray part is the amplitude. The proposed CI-CDNet can suppress background noise efficiently, providing high-fidelity results for label-free cell observation. Moreover, CI-CDNet can reconstruct the discoid mature erythrocyte, as indicated by the red arrow in the ROI-I3. The quantitative results of Table 1 and visual results in Note 5 in the Supplemental Material show that the result of CI-CDNet using 50 images is close to the results of ePIE using 500 images. Thus, CI-CDNet can reduce data volume by 1 order of magnitude.
The satisfactory performance of CI-CDNet benefits the subsequent high-level semantic analysis. We demonstrated the high-accuracy white blood cell segmentation 45,46 and virtual

Conclusion and Discussion
We proposed a novel large-scale CI technique with a complexdomain-enhancing neural network, termed CI-CDNet. CI-CDNet introduced complex-domain operations to the CNN, which can exploit the latent correlations between amplitude and phase. In this way, the proposed technique broke the inherent astriction of the conventional real-domain neural network, realizing cross-field and joint representation of complex wavefront. Furthermore, a multisource noise model of large-scale CI was built to train CI-CDNet. The high-accuracy noise model benefits the network's domain-adaptation ability from synthetic data to real data, improving its performance in various degraded scenes. The data-driven and end-to-end manners brought low computational complexity of CI-CDNet for large-scale CI. We compared CI-CDNet with model-driven methods (BM3D and CD-BM3D) and data-driven methods (Real-NN and dual-channel neural network; see Note 8 in the Supplemental Material) in a series of large-scale CI modalities, including KKR holography, FPM, and LCP. The results validated its state-of-the-art performance for extremely few data volumes and low exposure time. Specifically, in KKR, CI-CDNet can reduce the exposure time by more than 1 order of magnitude.
In FPM, CI-CDNet improved by more than 11 and 18 dB of amplitude and phase respectively on the PSNR index. In LCP, it reduced the data volume by nearly 1 order of magnitude.
To conclude, the proposed technique breaks the trade-off among computational complexity, generalization, and reconstruction accuracy. It can be extended for more generalized frameworks and applications in future work. The noise map of CI-CDNet is an essential parameter for its performance. In our implementation, it relied on heuristic estimation and manual adjustment, which was difficult to estimate accurately for real noise. The recent advanced blind denoising technique 48 and reinforcement learning technique 49 are expected to solve the problem and realize noise map estimation and all parameters adjustment automatically during iterations.
The current CI-CDNet requires a prereconstructed intermediate step. The two-step processing is unsuitable for computation resource-limited platforms that require real-time imaging. In addition, the performance of CI-CDNet is inseparable from the prereconstructed accuracy. End-to-end learning for different modalities using the proposed technique is an effective way to avoid intermediate step. But the generalization would be sacrificed, and the neural network requires to be retrained for different imaging modalities. An alternative solution is combining physics-informed frameworks, such as deep image prior 50 and deep unfolding 51 techniques. They incorporate the physics model and the built-in smoothness prior of the neural network to optimize the imaging tasks. Nevertheless, their large memory requirement for graphics processing units is a bottleneck for ultralarge-scale imaging.
We believe that the complex-domain neural network is potentially even more broadly transformative for optimizing the whole imaging workflow. Specifically, it can be introduced to the joint optimization of imaging setup and reconstruction, 52 for instance, the illumination angle, modulation pattern, imaging distance, or even more generalized physical parameters. In addition, the application scenarios can be also extended, such as multiple dimensions voxel reconstruction and holographic image segmentation and recognition. This may open new insight into complex wavefront representation in various optoelectronics fields.

Data Availability
The neural network and the pretrained model of CI-CDNet are available at github.com/bianlab/CI-CDNet.