Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging

Abstract. Holographic imaging poses significant challenges when facing real-time disturbances introduced by dynamic environments. The existing deep-learning methods for holographic imaging often depend solely on the specific condition based on the given data distributions, thus hindering their generalization across multiple scenes. One critical problem is how to guarantee the alignment between any given downstream tasks and pretrained models. We analyze the physical mechanism of image degradation caused by turbulence and innovatively propose a swin transformer-based method, termed train-with-coherence-swin (TWC-Swin) transformer, which uses spatial coherence (SC) as an adaptable physical prior information to precisely align image restoration tasks in the arbitrary turbulent scene. The light-processing system (LPR) we designed enables manipulation of SC and simulation of any turbulence. Qualitative and quantitative evaluations demonstrate that the TWC-Swin method presents superiority over traditional convolution frameworks and realizes image restoration under various turbulences, which suggests its robustness, powerful generalization capabilities, and adaptability to unknown environments. Our research reveals the significance of physical prior information in the optical intersection and provides an effective solution for model-to-tasks alignment schemes, which will help to unlock the full potential of deep learning for all-weather optical imaging across terrestrial, marine, and aerial domains.


Introduction
Holographic imaging is an interdisciplinary field that combines optics, computer science, and applied mathematics to generate holographic images using numerical algorithms.Although the concept of using computers to generate holograms can be traced back to the 1960s, it was not until the emergence of digital imaging and processing techniques in the 1990s that computational holography began to develop into a viable technology. 1,2n the 1990s, digital holography started to gain more attention due to advancements in computer technology and digital image processing. 3In recent years, holographic imaging has continued to advance, with new research and technology enabling even more sophisticated holographic imaging capabilities.[6][7][8][9][10] Spatial coherence (SC) is a critical factor that determines the quantity and quality of high-frequency information carried by the light beam in holographic imaging.High-frequency information is crucial for achieving high resolution and capturing fine details in an image.When the SC of the light source is low, the phase relationship of the beam becomes chaotic, causing the interference pattern to be washed out and resulting in insufficient transmission of high-frequency information.As a result, the reconstructed image has a lower resolution and less fine-detail information, as the high-frequency information needed to capture these details has been lost.Therefore, high SC light is preferred for holographic imaging to ensure that sufficient high-frequency information is present in the interference pattern and the hologram, resulting in high-resolution and detailed reconstructed images.However, the SC of light sources is often very low in complex scenes, which leads to image degradation and loss of details.][13][14][15] Oceanic and atmospheric turbulence may profoundly influence optical imaging, engendering distortions and deterioration in photographs acquired through cameras and alternative optical detection devices.The distortion and degradation of images caused by oceanic turbulence occur because the turbulent motions in the water column cause variations in the refractive index of the water, which in turn leads to variations in the path of light as it travels through the water.Atmospheric turbulence occurs because the Earth's atmosphere is not uniform and contains regions of varying temperature and density, which can cause variations in the refractive index of the air.Whether it is oceanic turbulence or atmospheric turbulence, as the beam passes through these regions of varying refractive index, phase correlation changes, and the SC is distorted, causing the image to become blurred and distorted, or even completely lost.7][18][19][20][21][22][23] There is no denying the fact that it is difficult to use the same methods to simultaneously resolve holographic imaging problems with low-SC scenes and multiple intensities of turbulence.Although low-SC and turbulence may not appear to be correlated at first glance, their influence on computational holography can both be described through the concept of SC.As a result, we can transform the aforementioned issues into the imaging problem of different SCs and leverage the advantages of deep learning to train a generalized model that can achieve image restoration for any turbulence intensity and low SC.
Artificial intelligence for optics has unparalleled advantages, especially in the field of holography.For example, deep learning can address challenging inverse problems in holographic imaging, where the objective is to recover the original scene or object properties from observed images or measurements and enhance the resolution of optical imaging systems beyond their traditional diffraction limit, [24][25][26][27][28][29][30] etc. Intersection research of optics and deep learning aims to solve massive tasks with one model, and one important problem is how to guarantee the alignment between the distribution of any given downstream data and tasks with pretrained models.This means that the same model and weights can only be applied to a specific environment.Our research uses SC as adaptable real-time physical prior information to precisely align any scenes with pretrained models.By combining the most advanced deep-learning algorithms, residual network, 31 and swin transformer, 32 we proposed our deep-learning-based methodology, termed as train-with-coherence-swin (TWC-Swin) method.It can achieve the restoration of computational holographic imaging under any low SC and turbulence.
We summarize the innovations of this paper as follows.
(1) We designed an LPR to simultaneously acquire two outputs: computational holographic imaging results and corresponding interference fringes under different SCs and turbulent scenes [Fig.1(a)].We manipulated SC by changing the distance between lens 1 and the rotating diffuser (RD).The SC can be calculated using the obtained interference fringes, while the imaging results serve as training and testing data for the neural network.In our experiment, we employed partially coherent light as the light source and loaded the turbulence phase using a spatial light modulator (SLM) with a loading frequency of 20 Hz.The original data sets consist of natural images characterized by complex elements and low sparsity, rather than simple symbols or letters.
(2) The core of the TWC-Swin method is a swin adapter and swin-model network [Fig. 1 S1 in the Supplementary Material provides the correspondence between SC and swin-model space.These weights are obtained by network training at different SCs.Our swin model utilizes the swintransformer algorithm and incorporates local-global convolution residuals to construct its architecture.Additionally, the preprocessing and postprocessing modules are exclusively based on convolutions.The detailed internal architecture of the swin model is shown in Fig. S1 in the Supplementary Material.Compared to the convolutional neural network and pure swin-transformer frameworks, the swin model exhibits superior performance.In addition, the TWC-Swin method takes about 28 ms to process a picture, which can achieve real-time video-level restoration.
(3) Our model was only trained on different SCs and performed well on turbulence data, despite not being trained on such data.We can effectively restore holographic images under various turbulent scenes using a swin adapter.This suggests that our method has strong generalization capability and learned universal features of image degradation and restoration during training that are applicable beyond the specific conditions of the training data.
(4) Our strategy for training neural networks with SC as the physical prior information is applicable to arbitrary neural networks not limited to the swin model.It showcases the advantage of "transferability," meaning that it can be transferred or applied effectively across different neural network architectures and tasks.Moreover, we achieve model-to-task alignment using physical principles and demonstrate the strategy's ability to incorporate physical principles into the learning process, which can be particularly beneficial for tasks involving physical data or phenomena.This lends a degree of interpretability to the model, as it leverages known physical principles to inform its predictions.

Scheme of the LPR
Figure 1(a) shows the LPR.The high-coherence light source generated by the solid-state laser (CNI, MLL-FN, 532 nm) is polarized horizontally after passing through a half-wave plate and a polarization beam splitter, allowing it not only to match the modulation mode of the SLM but also to adjust the beam intensity.The RD (DHC, GCL-201) is used to reduce the SC of the light source, with the degree of reduction depending on the radius of the incident beam on the RD-the larger the radius is, the lower the SC of the output light source is (see Note 2 in the Supplementary Material).In the experiment, we control the incident beam radius by adjusting the distance between lens 1 (L1, 100 mm) and the RD.After being collimated by lens 2 (L2, 100 mm), the beam is incident on the SLM1 (HDSLM80R) loaded with turbulent phase, which is continuously refreshed at a rate of 20 Hz.After passing through the turbulence, the beam is split into two parts by a beam splitter.The first part employs Michelson interference to capture interference fringes and measure the SC of the light.The second part is used for holographic imaging, with the phase hologram of the image loaded onto the SLM2 (PLUTO).The high-pass filter is employed to filter out the unmodulated zero-order diffraction pattern, and the final imaging result is captured by the complementary metal-oxide-semiconductor (CMOS, Sony, E3ISPM).In summary, we control the SC of the light source by adjusting the distance between lens L1 and the RD.We simulate a turbulent environment using the SLM1, with the intensity of the turbulence depending on the loaded turbulent phase.If turbulence is not required, the SLM1 can be turned off, and it functions as a mirror equivalent.

Oceanic Turbulence and Atmospheric Turbulence
The turbulence intensity in the experiment is determined by the spatial power spectrum of the turbulence.The function of the spatial power spectrum of the turbulent refractive-index fluctuations used in this paper is based on the assumption that turbulence is homogeneous and isotropic.We use the Nikishov power spectrum to describe oceanic turbulence: 33

> > > <
> > > : where κ is the spatial wavenumber of turbulent fluctuations, . ε is the dissipation rate of turbulent kinetic energy per unit mass.η ¼ 10 −3 m is the Kolmogorov microscale (inner scale).ω is the index of the relative strength of temperature and salinity fluctuations.A T ¼ 1.863 × 10 −2 , A S ¼ 1.9 × 10 −4 , and A TS ¼ 9.41 × 10 −3 .χ t stands for a variate that represents the rate of dissipation of mean-square temperature, which varies from 10 −10 K 2 ∕s in deep water to 10 −4 K 2 ∕s in surface water.We only changed the oceanic turbulence intensity by adjusting χ t ; the greater the value of χ t is, the stronger the oceanic turbulence is.Detailed parameter settings for the power spectrum of oceanic turbulence can be found in Table S2 in the Supplementary Material.
For atmospheric turbulence, we use the non-Kolmogorov power spectrum, 34

8
where α is the refractive index power spectral density power law.l 0 and L 0 represent inner and outer scales, respectively.C 2 n denotes the refractive index structure constant.We only changed the atmospheric turbulence intensity by adjusting C 2 n ; the greater the value of C 2 n is, the stronger the atmospheric turbulence is.Detailed parameter settings for the power spectrum of atmospheric turbulence can be found in Table S2 in the Supplementary Material.After setting reasonable parameters and returning to the space domain through the inverse Fourier transform, the turbulent phase can be obtained, which will be input into SLM1 to simulate the turbulent scene.

Data Acquisition
Low SC and turbulence are different physical scenarios, but the influence of these scenarios on holographic imaging can be described through SC.Based on the above method, we only use the data obtained under different SCs for model training, and any other data are used for testing [Fig.1(g)].The process of data acquisition is as follows.
(1) For low-SC data sets, we do not consider turbulence, turn off SLM1, and only adjust the SC of the light source by changing the distance between lens L1 and the RD.The initial distance is the focal length of lens L1 and the rotation speed of the diffuser is 200 r/min.
(2) The original image generates phase holograms through the iterative Gerchberg-Saxton algorithm 35 and loads them into the SLM2 sequentially.The imaging results are captured by CMOS1, and CMOS2 captures the interference fringes and calculates the degree of coherence.
(3) After all the data are captured, we increase the distance between L1 and RD.Each increment is 0.1 times the focal length of L1, and we keep the rotation speed of the diffuser constant and then repeat step 2.
(4) For the turbulence data set, the random turbulence phase generated by the simulation was loaded into SLM1 at a frequency of 20 Hz, and the distance between L1 and RD was kept as the focal length of L1.We input the phase hologram into SLM2.CMOS2 captures the interference fringes and CMOS1 captures 500 images.The final imaging result can be obtained by weighting and averaging these images.
(5) We change the intensity and type of turbulence loaded on SLM1 and then repeat step 4.
Our original images consist of public data sets, such as the Berkeley segmentation data set (BSD), 36 Celebfaces attributes high-quality data set (CelebA), 37 Flickr data set (Flickr), 38 Webvision data set (WED), 39 and DIV2k data set (DIV). 40The training set is only composed of images captured by CMOS1 in steps 2 and 3.
In the training phase, we divide the training data into 11 groups based on SC and send them to the network for training in turn.Therefore we can obtain a model space containing swin models with different weights.In the testing phase, the swin adapter is a program that needs to receive the SC information of the light source and selects the optimal model in model space to achieve the image restoration task.Here we set to distance priority mode, and the swin adapter will select the weight parameter closest to the measured SC.The test set comes from the images generated in steps 4 and 5.Note that none of the test sets have been trained; they are blinded to the network.Our model was implemented using PyTorch; the detailed architecture can be found in Note 1 in the Supplementary Material.We use adaptive moment estimation with weight decay (AdamW) as optimizer, 41 which is utilized to update the weights with initial learning rates of 0.0005 with a 50% drop every 10 epochs.The total epoch is 100.Mean-squared error (MSE) is the loss function of the network.All training and testing stages are placed on the NVIDIA GTX3080Ti graphics card, and a full training period takes about 12 h.To effectively verify the performance of our method, a series of credible image quality assessment measures were applied.The full-reference measures include peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and Pearson correlation coefficient (PCC), which are used to provide an assessment of a single image in relation to perceived visual quality.See Note 4 in the Supplementary Material for descriptions of evaluation indices.

Results and Discussion
This section primarily showcases the performance of our method under various SCs and turbulent scenes.We simulated different strengths of oceanic and atmospheric turbulence, enhancing the diversity of turbulence intensities and types.Additionally, we conducted comparative analyses with traditional convolutional-residual networks and performed ablation studies to reinforce the validity and efficiency of our proposed method.It is important to emphasize that our training data exclusively consisted of holographic imaging results obtained under different SC conditions, with none of the test data used during the training phase.results, each representing a different SC level and containing samples from five distinct data sets.As described in Sec. 2, the SC of the light source can be altered by adjusting the distance between RD and L1.It is evident that as the SC decreases, the quality of holographic imaging deteriorates significantly, exhibiting high levels of noise and blurriness.Simultaneously, the decrease in SC corresponds to a reduction in light efficiency, resulting in darker images that ultimately become indiscernible.After processing through the trained network, these degraded images become smoother, with improved sharpness, enhanced details, and reduced noise.Remarkably, even in low SC conditions where the original images captured by the CMOS1 sensor lack any discernible details, our network successfully reconstructs a significant portion of the elements.To accurately evaluate the effectiveness of image restoration, we present the evaluation indices (SSIM and PCC), comparing the original and reconstructed images with respect to the ground truth for different SCs [Fig. 1  Table S3 in the Supplementary Material.The quantitative results further validate the significant improvement achieved in various indicators of the reconstructed images compared to the original ones, approaching the ground truth.Figure 3

Performance on Oceanic Turbulence and Atmospheric Turbulence
Owing to the stochastic variations of the refractive index within oceanic and atmospheric turbulence, the phase information of light beams becomes distorted, thereby reducing SC and degrading the quality of computational holography images.This issue can be effectively addressed using the TWC-Swin method.It should be mentioned that all images captured under turbulent scenes were never trained by the network.Figure 4 demonstrates the remarkable image restoration capability of TWC-Swin method under varying intensities of oceanic and atmospheric turbulence.As discussed in Sec. 2, the turbulence intensity depends on certain variates of the power spectrum function, where stronger turbulence presents more complex simulated turbulence phases, as shown in Figs.4(A5) and 4(O5).We carried out experiments under five distinct intensities of both oceanic and atmospheric turbulence, and simultaneously measured the SC of the light source for selecting the optimal model.It should be noted that the turbulence phase loaded on the SLM is continuously refreshed (20 Hz).To provide stronger evidence, we present the evaluation indices (SSIM and PCC) for oceanic and atmospheric turbulence in Tables 2 and 3 and Fig. 1(h), whereas additional indices (MSE and PSNR) can be found in Tables S4  and S5 in the Supplementary Material.Our analysis concluded that as the turbulence intensity increases, the SC experiences a downturn, which subsequently degrades image quality.Nevertheless, our proposed method is capable of overcoming these adverse effects and effectively improving the image quality regardless of the turbulence intensity.Our model learns the universal features of image degradation and restoration that depend on SC.This further demonstrates the strong generalization  capability of the network trained with SC as physical prior information and the ability to apply learned knowledge from the training set to new, unseen scenes.This versatility is a desirable trait in a neural network, as it suggests the method's potential for broad application.

Comparison between Different Methods and Ablation Study
In this section, we conduct a comprehensive comparative study of different methodologies, assessing their performance and  efficacy in restoring images under challenging conditions of low SC and turbulent scenes.Traditional convolution-fusion framework methods, U-net, 42 and U-RDN 13 were compared to demonstrate the power of the proposed swin model.In our network architecture, the swin transformer serves as a robust backbone module, responsible for extracting high-level features from input.The special working mechanism gives it powerful hierarchical representation and global perception capabilities.However, direct output from the swin transformer often exhibits artifacts and high-noise levels in image restoration tasks.Therefore, it is necessary to add lightweight convolutional layers as postprocessing blocks.Convolution layers capture local features of the image through local receptive fields, aiding in a better understanding of image details and textures while facilitating mapping from high-dimensional to low-dimensional spaces, resulting in high-quality output.To validate the effectiveness of the postprocessing block in the swin model, we conduct an ablation study.In the ablation study, we created a control group named pure swin, which was obtained by removing the postprocessing block from the swin model.The training processes and data sets of all methods are consistent.Figure 5 shows detailed comparisons of images processed by various methods.Figure 6 illustrates the quantitative results between different methods on various data sets.More qualitative results are provided in Figs.S3 and S4 in the Supplementary Material.Comparing the visual output results of pure swin and the swin model, we found that the output results of the pure swin framework will produce black spots, resulting in blurred perception; the SSIM is 0.8396, a 7% reduction.This is because the swin transformer lacks the ability to sense local features and dimensional mapping.Convolutional layers can fill this gap by offering a mechanism to refine and enhance local features past the swin transformer blocks.The ablation study (compared with pure swin) validates that the postprocessing module is indispensable for the swin model.
We tested the performance of other networks under the same conditions.Our proposed network outperforms other methods by presenting the lowest noise and best evaluation index.Tables S6 and S7 in the Supplementary Material provide a detailed quantitative comparison of the performance across different models and different SCs.In the task of image restoration under low SC, our proposed methodology exhibits superior performance across all evaluative indices when juxtaposed with alternative approaches.Figure 7 shows the comparative performance of various methods when faced with image degradation due to various turbulence types and intensities.We observed that all networks trained with SC exhibit the ability to significantly improve the image quality under turbulent scenes and not just the swin model.This is an exciting result, as it signifies the successful integration of physical prior information into network training, enabling the networks to be applied to multiple tasks and scenarios.

Conclusions
By leveraging the SC as physical prior information and harnessing advanced deep-learning algorithms, we proposed a methodology, TWC-Swin, which demonstrates exceptional capabilities in simultaneously restoring images in low SC and random turbulent scenes.Our multicoherence and multiturbulence holographic imaging data sets, consisting of natural images, are created by the LPR, which can simulate different SCs and turbulence scenes (see Sec. 2).Though the swin model used in the tests was trained solely on the multicoherence data set, it can achieve promising results on both low SC, oceanic turbulence and atmospheric turbulence scenes.The key is that we capture the common physical property in these scenes, SC, and use it as physical prior information to generate a training set, so that the TWC-Swin method exhibits remarkable generalization capabilities, effectively restoring images from unseen scenes beyond the training set.Furthermore, through a series of rigorous experiments and comparisons, we have established the superiority of the swin model over traditional convolutional frameworks and alternative methods in terms of image restoration from qualitative and quantitative analysis (see Sec. 3).The integration of SC as a fundamental guiding principle in network training has proven to be a powerful strategy in aligning downstream tasks with pretrained models.
Our research findings offer guidance not only for the domain of optical imaging but also for the integration with the segment anything model (SAM), 43 extending its applicability to multiphysics scenarios.For instance, in turbulent scenes, our methodology can be implemented for preliminary image processing, enabling the utilization of unresolved images for precise image recognition and segmentation tasks facilitated by SAM.Moreover, our experimental scheme also provides a simple idea for turbulence detection.Our research contributes valuable insights into the use of deep-learning algorithms for addressing image degradation problems in multiple scenes and highlights the importance of incorporating physical principles into network training.It is foreseeable that our research can serve as a successful case for the combination of deep learning and holographic imaging in the future, which facilitates the synergistic advancement of the fields of optics and computer science.
(b)].The swin adapter can select the optimal model from the model space by obtaining SC.The architecture of the swin model in model space is the same; only the weights are different [Figs.1(c) and 1(d)].Table

Fig. 1
Fig. 1 Principle and performance of TWC-Swin method.(a) LPR.SC modulation can adjust the SC by changing the distance D. Holographic modulation is used to load the phase hologram.The LPR generates two outputs, one for calculating SC and the other for network input.HWP, halfwave plate; PBS, polarized beam splitter; L, lens; RD, rotating diffuser; SLM, spatial light modulator; F, filter.D, distance between L1 and RD.(b) The detailed flow of the TWC-Swin method.The swin adapter can select the optimal model from the model space by obtaining SC.The color picture represents a case in progress.(c) Swin-model space and architecture of the swin model.The architecture of M 1 − M 11 is the same; only the weights are different.The weights are obtained by network training at different distances.(d) The correspondence between SC and swin-model space.See Table S1 in the Supplementary Material for detailed data.(e) Inputs and outputs of the swin model with different SCs.(f) SSIM and PCC of swin-model outputs at different SCs.(g) Training and test data acquisition process.The training data did not contain any turbulence.(h) SSIM and PCC of swin-model outputs at different turbulent scenes.

Figures 2 Fig. 2
Figures 2 and 1(e) show the original images captured by CMOS1 and restored images processed by the TWC-Swin method under different SCs.We present 11 groups of test

Fig. 3
Fig. 3 Average results of the evaluation indices for each test data set.The coherence is 0.368.Results of other coherences are provided in Fig. S2 in the Supplementary Material.All evaluation indices demonstrate that our method possesses strong image restoration ability under low SC.
illustrates the average evaluation indices for each test set.Here only partial results are shown; more detailed results are included in Fig.S2in the Supplementary Material.It can be seen that each evaluation index of images has risen significantly compared to the original images after being processed by the TWC-Swin method, indicating a substantial improvement in the image quality.Moreover, the network demonstrates its robust generalization capability by performing image restoration on multiple test sets, which are beyond the scope of the training set.This implies that our method has effectively learned the underlying patterns in the data during training and can apply these patterns to unseen data, resulting in successful image restoration.

Fig. 5
Fig. 5 Visualization of performance of different methods.The SSIM is shown in the bottom left corner.Our method presents the best performance, which is shown by smoother images with lower noise.(a) Sample selected with the WED data set and magnified insets of the red bounding region.(b) Sample selected with Flickr data set and magnified insets of the red bounding region.The pure swin model can be obtained by removing the postprocessing block of the swin model (Video 3, MP4, 0.6 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s3]).

Fig. 6
Fig. 6 Performance between different methods on various data sets with SC being 0.494.Our model outperforms other methods across various data sets and indices.

Table 1
Quantitative analysis of evaluation indices (SSIM and PCC) at different SCs and test samples a .f 1 is the focal length of L1.SC means spatial coherence of the light source.
a Bold values indicate the better index.

Table 2
Quantitative analysis of evaluation indices (SSIM and PCC) at different oceanic turbulence intensities a .
a Bold values indicate the better index.

Table 3
Quantitative analysis of evaluation indices (SSIM and PCC) at different atmospheric turbulence intensities a .
a Bold values indicate the better index.