Open Access
25 October 2023 Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging
Xin Tong, Renjun Xu, Pengfei Xu, Zishuai Zeng, Shuxi Liu, Daomu Zhao
Author Affiliations +
Abstract

Holographic imaging poses significant challenges when facing real-time disturbances introduced by dynamic environments. The existing deep-learning methods for holographic imaging often depend solely on the specific condition based on the given data distributions, thus hindering their generalization across multiple scenes. One critical problem is how to guarantee the alignment between any given downstream tasks and pretrained models. We analyze the physical mechanism of image degradation caused by turbulence and innovatively propose a swin transformer-based method, termed train-with-coherence-swin (TWC-Swin) transformer, which uses spatial coherence (SC) as an adaptable physical prior information to precisely align image restoration tasks in the arbitrary turbulent scene. The light-processing system (LPR) we designed enables manipulation of SC and simulation of any turbulence. Qualitative and quantitative evaluations demonstrate that the TWC-Swin method presents superiority over traditional convolution frameworks and realizes image restoration under various turbulences, which suggests its robustness, powerful generalization capabilities, and adaptability to unknown environments. Our research reveals the significance of physical prior information in the optical intersection and provides an effective solution for model-to-tasks alignment schemes, which will help to unlock the full potential of deep learning for all-weather optical imaging across terrestrial, marine, and aerial domains.

1.

Introduction

Holographic imaging is an interdisciplinary field that combines optics, computer science, and applied mathematics to generate holographic images using numerical algorithms. Although the concept of using computers to generate holograms can be traced back to the 1960s, it was not until the emergence of digital imaging and processing techniques in the 1990s that computational holography began to develop into a viable technology.1,2 In the 1990s, digital holography started to gain more attention due to advancements in computer technology and digital image processing.3 In recent years, holographic imaging has continued to advance, with new research and technology enabling even more sophisticated holographic imaging capabilities. Researchers have developed increasingly sophisticated numerical algorithms for holographic imaging, such as compressive sensing, sparse coding, and deep-learning techniques.410

Spatial coherence (SC) is a critical factor that determines the quantity and quality of high-frequency information carried by the light beam in holographic imaging. High-frequency information is crucial for achieving high resolution and capturing fine details in an image. When the SC of the light source is low, the phase relationship of the beam becomes chaotic, causing the interference pattern to be washed out and resulting in insufficient transmission of high-frequency information. As a result, the reconstructed image has a lower resolution and less fine-detail information, as the high-frequency information needed to capture these details has been lost. Therefore, high SC light is preferred for holographic imaging to ensure that sufficient high-frequency information is present in the interference pattern and the hologram, resulting in high-resolution and detailed reconstructed images. However, the SC of light sources is often very low in complex scenes, which leads to image degradation and loss of details. Therefore, how to restore images under low-SC light sources is a challenging issue.1115

Oceanic and atmospheric turbulence may profoundly influence optical imaging, engendering distortions and deterioration in photographs acquired through cameras and alternative optical detection devices. The distortion and degradation of images caused by oceanic turbulence occur because the turbulent motions in the water column cause variations in the refractive index of the water, which in turn leads to variations in the path of light as it travels through the water. Atmospheric turbulence occurs because the Earth’s atmosphere is not uniform and contains regions of varying temperature and density, which can cause variations in the refractive index of the air. Whether it is oceanic turbulence or atmospheric turbulence, as the beam passes through these regions of varying refractive index, phase correlation changes, and the SC is distorted, causing the image to become blurred and distorted, or even completely lost. Massive efforts were devoted to finding a solution for imaging in various turbulences.1623 There is no denying the fact that it is difficult to use the same methods to simultaneously resolve holographic imaging problems with low-SC scenes and multiple intensities of turbulence. Although low-SC and turbulence may not appear to be correlated at first glance, their influence on computational holography can both be described through the concept of SC. As a result, we can transform the aforementioned issues into the imaging problem of different SCs and leverage the advantages of deep learning to train a generalized model that can achieve image restoration for any turbulence intensity and low SC.

Artificial intelligence for optics has unparalleled advantages, especially in the field of holography. For example, deep learning can address challenging inverse problems in holographic imaging, where the objective is to recover the original scene or object properties from observed images or measurements and enhance the resolution of optical imaging systems beyond their traditional diffraction limit,2430 etc. Intersection research of optics and deep learning aims to solve massive tasks with one model, and one important problem is how to guarantee the alignment between the distribution of any given downstream data and tasks with pretrained models. This means that the same model and weights can only be applied to a specific environment. Our research uses SC as adaptable real-time physical prior information to precisely align any scenes with pretrained models. By combining the most advanced deep-learning algorithms, residual network,31 and swin transformer,32 we proposed our deep-learning-based methodology, termed as train-with-coherence-swin (TWC-Swin) method. It can achieve the restoration of computational holographic imaging under any low SC and turbulence.

We summarize the innovations of this paper as follows.

  • (1) We designed an LPR to simultaneously acquire two outputs: computational holographic imaging results and corresponding interference fringes under different SCs and turbulent scenes [Fig. 1(a)]. We manipulated SC by changing the distance between lens 1 and the rotating diffuser (RD). The SC can be calculated using the obtained interference fringes, while the imaging results serve as training and testing data for the neural network. In our experiment, we employed partially coherent light as the light source and loaded the turbulence phase using a spatial light modulator (SLM) with a loading frequency of 20 Hz. The original data sets consist of natural images characterized by complex elements and low sparsity, rather than simple symbols or letters.

  • (2) The core of the TWC-Swin method is a swin adapter and swin-model network [Fig. 1(b)]. The swin adapter can select the optimal model from the model space by obtaining SC. The architecture of the swin model in model space is the same; only the weights are different [Figs. 1(c) and 1(d)]. Table S1 in the Supplementary Material provides the correspondence between SC and swin-model space. These weights are obtained by network training at different SCs. Our swin model utilizes the swin-transformer algorithm and incorporates local–global convolution residuals to construct its architecture. Additionally, the preprocessing and postprocessing modules are exclusively based on convolutions. The detailed internal architecture of the swin model is shown in Fig. S1 in the Supplementary Material. Compared to the convolutional neural network and pure swin-transformer frameworks, the swin model exhibits superior performance. In addition, the TWC-Swin method takes about 28 ms to process a picture, which can achieve real-time video-level restoration.

  • (3) Our model was only trained on different SCs and performed well on turbulence data, despite not being trained on such data. We can effectively restore holographic images under various turbulent scenes using a swin adapter. This suggests that our method has strong generalization capability and learned universal features of image degradation and restoration during training that are applicable beyond the specific conditions of the training data.

  • (4) Our strategy for training neural networks with SC as the physical prior information is applicable to arbitrary neural networks not limited to the swin model. It showcases the advantage of “transferability,” meaning that it can be transferred or applied effectively across different neural network architectures and tasks. Moreover, we achieve model-to-task alignment using physical principles and demonstrate the strategy’s ability to incorporate physical principles into the learning process, which can be particularly beneficial for tasks involving physical data or phenomena. This lends a degree of interpretability to the model, as it leverages known physical principles to inform its predictions.

Fig. 1

Principle and performance of TWC-Swin method. (a) LPR. SC modulation can adjust the SC by changing the distance D. Holographic modulation is used to load the phase hologram. The LPR generates two outputs, one for calculating SC and the other for network input. HWP, half-wave plate; PBS, polarized beam splitter; L, lens; RD, rotating diffuser; SLM, spatial light modulator; F, filter. D, distance between L1 and RD. (b) The detailed flow of the TWC-Swin method. The swin adapter can select the optimal model from the model space by obtaining SC. The color picture represents a case in progress. (c) Swin-model space and architecture of the swin model. The architecture of M1M11 is the same; only the weights are different. The weights are obtained by network training at different distances. (d) The correspondence between SC and swin-model space. See Table S1 in the Supplementary Material for detailed data. (e) Inputs and outputs of the swin model with different SCs. (f) SSIM and PCC of swin-model outputs at different SCs. (g) Training and test data acquisition process. The training data did not contain any turbulence. (h) SSIM and PCC of swin-model outputs at different turbulent scenes.

AP_5_6_066003_f001.png

2.

Materials and Methods

2.1.

Scheme of the LPR

Figure 1(a) shows the LPR. The high-coherence light source generated by the solid-state laser (CNI, MLL-FN, 532 nm) is polarized horizontally after passing through a half-wave plate and a polarization beam splitter, allowing it not only to match the modulation mode of the SLM but also to adjust the beam intensity. The RD (DHC, GCL-201) is used to reduce the SC of the light source, with the degree of reduction depending on the radius of the incident beam on the RD—the larger the radius is, the lower the SC of the output light source is (see Note 2 in the Supplementary Material). In the experiment, we control the incident beam radius by adjusting the distance between lens 1 (L1, 100 mm) and the RD. After being collimated by lens 2 (L2, 100 mm), the beam is incident on the SLM1 (HDSLM80R) loaded with turbulent phase, which is continuously refreshed at a rate of 20 Hz. After passing through the turbulence, the beam is split into two parts by a beam splitter. The first part employs Michelson interference to capture interference fringes and measure the SC of the light. The second part is used for holographic imaging, with the phase hologram of the image loaded onto the SLM2 (PLUTO). The high-pass filter is employed to filter out the unmodulated zero-order diffraction pattern, and the final imaging result is captured by the complementary metal–oxide-semiconductor (CMOS, Sony, E3ISPM). In summary, we control the SC of the light source by adjusting the distance between lens L1 and the RD. We simulate a turbulent environment using the SLM1, with the intensity of the turbulence depending on the loaded turbulent phase. If turbulence is not required, the SLM1 can be turned off, and it functions as a mirror equivalent.

2.2.

Oceanic Turbulence and Atmospheric Turbulence

The turbulence intensity in the experiment is determined by the spatial power spectrum of the turbulence. The function of the spatial power spectrum of the turbulent refractive-index fluctuations used in this paper is based on the assumption that turbulence is homogeneous and isotropic. We use the Nikishov power spectrum to describe oceanic turbulence:33

Eq. (1)

{Φn(κ)=0.388×108ε1/3κ11/3[1+2.35(κη)2/3]f(κ,ω,χt)f(κ,ω,χt)=χt[exp(ATδ)+ω2exp(ASδ)2ω1  exp(ATSδ)]δ=8.248(κη)4/3+12.978(κη)2,
where κ is the spatial wavenumber of turbulent fluctuations, κ=κx2+κy2+κz2. ε is the dissipation rate of turbulent kinetic energy per unit mass. η=103  m is the Kolmogorov microscale (inner scale). ω is the index of the relative strength of temperature and salinity fluctuations. AT=1.863×102, AS=1.9×104, and ATS=9.41×103. χt stands for a variate that represents the rate of dissipation of mean-square temperature, which varies from 1010  K2/s in deep water to 104  K2/s in surface water. We only changed the oceanic turbulence intensity by adjusting χt; the greater the value of χt is, the stronger the oceanic turbulence is. Detailed parameter settings for the power spectrum of oceanic turbulence can be found in Table S2 in the Supplementary Material.

For atmospheric turbulence, we use the non-Kolmogorov power spectrum,34

Eq. (2)

{Φn(κ)=A(α)Cn2exp(κ2/κm2)(κ2+κ02)α/2  (0<κ<,3<α<4)A(α)=14πΓ(α1)cos(πα2)c(α)=[2π3Γ(5α2)A(α)]1/α5κm=c(α)l0κ0=2πL0,
where α is the refractive index power spectral density power law. l0 and L0 represent inner and outer scales, respectively. Cn2 denotes the refractive index structure constant. We only changed the atmospheric turbulence intensity by adjusting Cn2; the greater the value of Cn2 is, the stronger the atmospheric turbulence is. Detailed parameter settings for the power spectrum of atmospheric turbulence can be found in Table S2 in the Supplementary Material. After setting reasonable parameters and returning to the space domain through the inverse Fourier transform, the turbulent phase can be obtained, which will be input into SLM1 to simulate the turbulent scene.

2.3.

Data Acquisition

Low SC and turbulence are different physical scenarios, but the influence of these scenarios on holographic imaging can be described through SC. Based on the above method, we only use the data obtained under different SCs for model training, and any other data are used for testing [Fig. 1(g)]. The process of data acquisition is as follows.

  • (1) For low-SC data sets, we do not consider turbulence, turn off SLM1, and only adjust the SC of the light source by changing the distance between lens L1 and the RD. The initial distance is the focal length of lens L1 and the rotation speed of the diffuser is 200 r/min.

  • (2) The original image generates phase holograms through the iterative Gerchberg–Saxton algorithm35 and loads them into the SLM2 sequentially. The imaging results are captured by CMOS1, and CMOS2 captures the interference fringes and calculates the degree of coherence.

  • (3) After all the data are captured, we increase the distance between L1 and RD. Each increment is 0.1 times the focal length of L1, and we keep the rotation speed of the diffuser constant and then repeat step 2.

  • (4) For the turbulence data set, the random turbulence phase generated by the simulation was loaded into SLM1 at a frequency of 20 Hz, and the distance between L1 and RD was kept as the focal length of L1. We input the phase hologram into SLM2. CMOS2 captures the interference fringes and CMOS1 captures 500 images. The final imaging result can be obtained by weighting and averaging these images.

  • (5) We change the intensity and type of turbulence loaded on SLM1 and then repeat step 4.

Our original images consist of public data sets, such as the Berkeley segmentation data set (BSD),36 Celebfaces attributes high-quality data set (CelebA),37 Flickr data set (Flickr),38 Webvision data set (WED),39 and DIV2k data set (DIV).40 The training set is only composed of images captured by CMOS1 in steps 2 and 3.

In the training phase, we divide the training data into 11 groups based on SC and send them to the network for training in turn. Therefore we can obtain a model space containing swin models with different weights. In the testing phase, the swin adapter is a program that needs to receive the SC information of the light source and selects the optimal model in model space to achieve the image restoration task. Here we set to distance priority mode, and the swin adapter will select the weight parameter closest to the measured SC. The test set comes from the images generated in steps 4 and 5. Note that none of the test sets have been trained; they are blinded to the network. Our model was implemented using PyTorch; the detailed architecture can be found in Note 1 in the Supplementary Material. We use adaptive moment estimation with weight decay (AdamW) as optimizer,41 which is utilized to update the weights with initial learning rates of 0.0005 with a 50% drop every 10 epochs. The total epoch is 100. Mean-squared error (MSE) is the loss function of the network. All training and testing stages are placed on the NVIDIA GTX3080Ti graphics card, and a full training period takes about 12 h. To effectively verify the performance of our method, a series of credible image quality assessment measures were applied. The full-reference measures include peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and Pearson correlation coefficient (PCC), which are used to provide an assessment of a single image in relation to perceived visual quality. See Note 4 in the Supplementary Material for descriptions of evaluation indices.

3.

Results and Discussion

This section primarily showcases the performance of our method under various SCs and turbulent scenes. We simulated different strengths of oceanic and atmospheric turbulence, enhancing the diversity of turbulence intensities and types. Additionally, we conducted comparative analyses with traditional convolutional-residual networks and performed ablation studies to reinforce the validity and efficiency of our proposed method. It is important to emphasize that our training data exclusively consisted of holographic imaging results obtained under different SC conditions, with none of the test data used during the training phase.

3.1.

Performance on Low SC

Figures 2 and 1(e) show the original images captured by CMOS1 and restored images processed by the TWC-Swin method under different SCs. We present 11 groups of test results, each representing a different SC level and containing samples from five distinct data sets. As described in Sec. 2, the SC of the light source can be altered by adjusting the distance between RD and L1. It is evident that as the SC decreases, the quality of holographic imaging deteriorates significantly, exhibiting high levels of noise and blurriness. Simultaneously, the decrease in SC corresponds to a reduction in light efficiency, resulting in darker images that ultimately become indiscernible. After processing through the trained network, these degraded images become smoother, with improved sharpness, enhanced details, and reduced noise. Remarkably, even in low SC conditions where the original images captured by the CMOS1 sensor lack any discernible details, our network successfully reconstructs a significant portion of the elements. To accurately evaluate the effectiveness of image restoration, we present the evaluation indices (SSIM and PCC), comparing the original and reconstructed images with respect to the ground truth for different SCs [Fig. 1(f) and Table 1]. Other indices are provided in Table S3 in the Supplementary Material. The quantitative results further validate the significant improvement achieved in various indicators of the reconstructed images compared to the original ones, approaching the ground truth. Figure 3 illustrates the average evaluation indices for each test set. Here only partial results are shown; more detailed results are included in Fig. S2 in the Supplementary Material. It can be seen that each evaluation index of images has risen significantly compared to the original images after being processed by the TWC-Swin method, indicating a substantial improvement in the image quality. Moreover, the network demonstrates its robust generalization capability by performing image restoration on multiple test sets, which are beyond the scope of the training set. This implies that our method has effectively learned the underlying patterns in the data during training and can apply these patterns to unseen data, resulting in successful image restoration.

Fig. 2

Qualitative analysis of our method’s performance at the different SCs. Input, raw image captured by CMOS1. Output, image processed by the network. (a)–(k) Different SCs: (a) D=f1, SC is 0.494; (b) D=1.1f1, SC is 0.475; (c) D=1.2f1, SC is 0.442; (d) D=1.3f1, SC is 0.419; (e) D=1.4f1, SC is 0.393; (f) D=1.5f1, SC is 0.368; (g) D=1.6f1, SC is 0.337; (h) D=1.7f1, SC is 0.311; (i) D=1.8f1, SC is 0.285; (j) D=1.9f1, SC is 0.25; and (k) D=2f1, SC is 0.245. D means the distance between L1 and RD in the LPR and f1 is the focal length of L1. Our method can achieve improved image quality under low SC (Video 1, MP4, 1.5 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s1]).

AP_5_6_066003_f002.png

Table 1

Quantitative analysis of evaluation indices (SSIM and PCC) at different SCs and test samplesa. f1 is the focal length of L1. SC means spatial coherence of the light source.

SCSSIMPCC
BSDCelebAFlickrWEDDIVBSDCelebAFlickrWEDDIV
Input_f1, SC = 0.4940.58930.59430.42960.61550.46250.93680.95750.92100.91460.8753
Output_f10.89840.89080.85230.90190.89400.98070.98930.98480.99300.9819
Input_1.3f1, SC = 0.4190.57750.54150.39170.62450.41840.89530.93030.85880.91490.8043
Output_1.3f10.91890.88420.86760.89970.89180.98430.99280.98800.99280.9827
Input_1.5f1, SC = 0.3680.61780.53940.27770.56770.38920.89570.92110.83960.89610.8144
Output_1.5f10.89060.85130.81710.85410.86220.96910.98810.97830.98690.9680
Input_1.7f1, SC = 0.3110.60400.50170.31830.55100.41360.83030.90350.85110.85680.7979
Output_1.7f10.86240.77910.74830.80130.80380.96440.97870.97020.97590.9583
Input_2f1, SC = 0.2450.48810.44690.30730.52710.36430.80720.88170.75570.83260.7196
Output_2f10.81460.75400.69620.77220.75720.94310.97130.95050.96310.9341
Ground truth1111111111

aBold values indicate the better index.

Fig. 3

Average results of the evaluation indices for each test data set. The coherence is 0.368. Results of other coherences are provided in Fig. S2 in the Supplementary Material. All evaluation indices demonstrate that our method possesses strong image restoration ability under low SC.

AP_5_6_066003_f003.png

3.2.

Performance on Oceanic Turbulence and Atmospheric Turbulence

Owing to the stochastic variations of the refractive index within oceanic and atmospheric turbulence, the phase information of light beams becomes distorted, thereby reducing SC and degrading the quality of computational holography images. This issue can be effectively addressed using the TWC-Swin method. It should be mentioned that all images captured under turbulent scenes were never trained by the network. Figure 4 demonstrates the remarkable image restoration capability of TWC-Swin method under varying intensities of oceanic and atmospheric turbulence. As discussed in Sec. 2, the turbulence intensity depends on certain variates of the power spectrum function, where stronger turbulence presents more complex simulated turbulence phases, as shown in Figs. 4(A5) and 4(O5). We carried out experiments under five distinct intensities of both oceanic and atmospheric turbulence, and simultaneously measured the SC of the light source for selecting the optimal model. It should be noted that the turbulence phase loaded on the SLM is continuously refreshed (20 Hz). To provide stronger evidence, we present the evaluation indices (SSIM and PCC) for oceanic and atmospheric turbulence in Tables 2 and 3 and Fig. 1(h), whereas additional indices (MSE and PSNR) can be found in Tables S4 and S5 in the Supplementary Material. Our analysis concluded that as the turbulence intensity increases, the SC experiences a downturn, which subsequently degrades image quality. Nevertheless, our proposed method is capable of overcoming these adverse effects and effectively improving the image quality regardless of the turbulence intensity. Our model learns the universal features of image degradation and restoration that depend on SC. This further demonstrates the strong generalization capability of the network trained with SC as physical prior information and the ability to apply learned knowledge from the training set to new, unseen scenes. This versatility is a desirable trait in a neural network, as it suggests the method’s potential for broad application.

Fig. 4

Qualitative analysis of our method’s performance across varying intensities of (a) oceanic and (b) atmospheric turbulence. The network trained with coherence as physical prior information can effectively overcome the impact of turbulence on imaging and improve image quality. (O1)–(O5) mean oceanic turbulence phase and (A1)–(A5) mean atmospheric turbulence phase. (O1) χt=109  K2/s, coherence is 0.491. (O2) χt=107  K2/s, coherence is 0.482. (O3) χt=2×107  K2/s, coherence is 0.447. (O4) χt=4×107  K2/s, coherence is 0.404. (O5) χt=106  K2/s, coherence is 0.373. (A1) Cn2=1014  m3α, coherence is 0.507. (A2) Cn2=1.5×1013  m3α, coherence is 0.459. (A3) Cn2=2.5×1013  m3α, coherence is 0.43. (A4) Cn2=3.5×1013  m3α, coherence is 0.403. (A5) Cn2=5×1013  m3α, coherence is 0.378. Other parameter settings of the turbulent power spectrum function can be found in Table S2 in the Supplementary Material (Video 2, MP4, 36.4 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s2]).

AP_5_6_066003_f004.png

Table 2

Quantitative analysis of evaluation indices (SSIM and PCC) at different oceanic turbulence intensitiesa.

Oceanic turbulenceSSIMPCC
BSDCelebAFlickrWEDDIVBSDCelebAFlickrWEDDIV
Input (O1)0.53310.67730.68100.60160.70180.89780.94040.88760.90960.8718
Output (O1)0.80880.79160.83680.80770.81720.93030.97070.93340.95600.9044
Input (O2)0.50980.65660.66900.57160.53710.88550.93290.87860.89700.8494
Output (O2)0.78230.76090.80150.78190.80050.92110.96110.92090.94480.8901
Input (O3)0.49500.65380.65750.54550.52810.87640.93130.85850.89160.8371
Output (O3)0.71910.71690.84340.73780.79840.88960.94130.88710.93440.8793
Input (O4)0.47960.64080.64740.50340.50740.87740.92450.85760.86640.8130
Output (O4)0.70600.69320.72870.67180.72170.88470.93790.88350.88920.8213
Input (O5)0.45190.60410.62020.44460.49450.84560.90750.82870.82810.7631
Output (O5)0.68990.67210.72250.62860.69580.89090.94150.88880.88390.8152
Ground truth1111111111

aBold values indicate the better index.

Table 3

Quantitative analysis of evaluation indices (SSIM and PCC) at different atmospheric turbulence intensitiesa.

Atmospheric turbulenceSSIMPCC
BSDCelebAFlickrWEDDIVBSDCelebAFlickrWEDDIV
Input (A1)0.57380.68210.69880.64950.63380.90140.94040.89290.91600.9766
Output (A1)0.77980.77410.83370.81610.82310.93610.95640.92150.95740.9116
Input (A2)0.53110.65130.67270.57430.57010.87970.92640.86760.88960.8279
Output (A2)0.73120.69380.76990.69600.75810.89200.93530.89240.91410.8643
Input (A3)0.50830.63830.67850.53480.57200.86880.92020.84930.87470.8081
Output (A3)0.66150.67970.74270.63620.73690.88430.93920.87080.89190.8418
Input (A4)0.49650.62640.66350.52020.55750.85900.91610.83640.86730.8040
Output (A4)0.69150.67510.72870.63360.72730.87890.93080.87050.88550.8331
Input (A5)0.49590.61530.65950.48400.54070.85240.90800.82630.84930.7862
Output (A5)0.67610.68930.72010.61270.68020.87190.94650.88750.87490.8255
Ground truth1111111111

aBold values indicate the better index.

3.3.

Comparison between Different Methods and Ablation Study

In this section, we conduct a comprehensive comparative study of different methodologies, assessing their performance and efficacy in restoring images under challenging conditions of low SC and turbulent scenes. Traditional convolution-fusion framework methods, U-net,42 and U-RDN13 were compared to demonstrate the power of the proposed swin model.

In our network architecture, the swin transformer serves as a robust backbone module, responsible for extracting high-level features from input. The special working mechanism gives it powerful hierarchical representation and global perception capabilities. However, direct output from the swin transformer often exhibits artifacts and high-noise levels in image restoration tasks. Therefore, it is necessary to add lightweight convolutional layers as postprocessing blocks. Convolution layers capture local features of the image through local receptive fields, aiding in a better understanding of image details and textures while facilitating mapping from high-dimensional to low-dimensional spaces, resulting in high-quality output. To validate the effectiveness of the postprocessing block in the swin model, we conduct an ablation study. In the ablation study, we created a control group named pure swin, which was obtained by removing the postprocessing block from the swin model. The training processes and data sets of all methods are consistent. Figure 5 shows detailed comparisons of images processed by various methods. Figure 6 illustrates the quantitative results between different methods on various data sets. More qualitative results are provided in Figs. S3 and S4 in the Supplementary Material. Comparing the visual output results of pure swin and the swin model, we found that the output results of the pure swin framework will produce black spots, resulting in blurred perception; the SSIM is 0.8396, a 7% reduction. This is because the swin transformer lacks the ability to sense local features and dimensional mapping. Convolutional layers can fill this gap by offering a mechanism to refine and enhance local features past the swin transformer blocks. The ablation study (compared with pure swin) validates that the postprocessing module is indispensable for the swin model.

Fig. 5

Visualization of performance of different methods. The SSIM is shown in the bottom left corner. Our method presents the best performance, which is shown by smoother images with lower noise. (a) Sample selected with the WED data set and magnified insets of the red bounding region. (b) Sample selected with Flickr data set and magnified insets of the red bounding region. The pure swin model can be obtained by removing the postprocessing block of the swin model (Video 3, MP4, 0.6 MB [URL: https://doi.org/10.1117/1.AP.5.6.066003.s3]).

AP_5_6_066003_f005.png

Fig. 6

Performance between different methods on various data sets with SC being 0.494. Our model outperforms other methods across various data sets and indices.

AP_5_6_066003_f006.png

We tested the performance of other networks under the same conditions. Our proposed network outperforms other methods by presenting the lowest noise and best evaluation index. Tables S6 and S7 in the Supplementary Material provide a detailed quantitative comparison of the performance across different models and different SCs. In the task of image restoration under low SC, our proposed methodology exhibits superior performance across all evaluative indices when juxtaposed with alternative approaches. Figure 7 shows the comparative performance of various methods when faced with image degradation due to various turbulence types and intensities. We observed that all networks trained with SC exhibit the ability to significantly improve the image quality under turbulent scenes and not just the swin model. This is an exciting result, as it signifies the successful integration of physical prior information into network training, enabling the networks to be applied to multiple tasks and scenarios.

Fig. 7

(a), (b) Performance comparison between different methods at various turbulent scenes. (A1) Cn2=1014  m3α, coherence is 0.506. (A2) Cn2=1.5×1013  m3α, coherence is 0.459. (O1) χt=109  K2/s, coherence is 0.491. (O2) χt=107  K2/s, coherence is 0.482. Note that all methods are trained with coherence as physical prior information and improve image quality under turbulence conditions. This demonstrates that incorporating appropriate physical prior information can help the network cope with multiscene tasks.

AP_5_6_066003_f007.png

4.

Conclusions

By leveraging the SC as physical prior information and harnessing advanced deep-learning algorithms, we proposed a methodology, TWC-Swin, which demonstrates exceptional capabilities in simultaneously restoring images in low SC and random turbulent scenes. Our multicoherence and multiturbulence holographic imaging data sets, consisting of natural images, are created by the LPR, which can simulate different SCs and turbulence scenes (see Sec. 2). Though the swin model used in the tests was trained solely on the multicoherence data set, it can achieve promising results on both low SC, oceanic turbulence and atmospheric turbulence scenes. The key is that we capture the common physical property in these scenes, SC, and use it as physical prior information to generate a training set, so that the TWC-Swin method exhibits remarkable generalization capabilities, effectively restoring images from unseen scenes beyond the training set. Furthermore, through a series of rigorous experiments and comparisons, we have established the superiority of the swin model over traditional convolutional frameworks and alternative methods in terms of image restoration from qualitative and quantitative analysis (see Sec. 3). The integration of SC as a fundamental guiding principle in network training has proven to be a powerful strategy in aligning downstream tasks with pretrained models.

Our research findings offer guidance not only for the domain of optical imaging but also for the integration with the segment anything model (SAM),43 extending its applicability to multiphysics scenarios. For instance, in turbulent scenes, our methodology can be implemented for preliminary image processing, enabling the utilization of unresolved images for precise image recognition and segmentation tasks facilitated by SAM. Moreover, our experimental scheme also provides a simple idea for turbulence detection. Our research contributes valuable insights into the use of deep-learning algorithms for addressing image degradation problems in multiple scenes and highlights the importance of incorporating physical principles into network training. It is foreseeable that our research can serve as a successful case for the combination of deep learning and holographic imaging in the future, which facilitates the synergistic advancement of the fields of optics and computer science.

Code and Data Availability

The codes of the TWC-Swin method, trained model, as well as some example images for testing, are publicly available at https://github.com/tongxinoptica/TWC-Swin. All relevant data that support the findings of this work are available from the corresponding author upon reasonable request. The parameter settings of TWC-Swin used in synthesizing the training and evaluation data sets will be publicly available along with this paper.

Acknowledgments

This work was mainly supported by the National Natural Science Foundation of China (Grants Nos. 12174338 and 11874321) received by D.M.Z. All authors contributed to the discussions and preparation of the manuscript.44 The authors declare no competing interests.

References

1. 

L. B. Lesem, P. M. Hirsch and J. A. Jordan, “Scientific applications: computer synthesis of holograms for 3D display,” Commun. ACM, 11 661 –674 https://doi.org/10.1145/364096.364111 CACMA2 0001-0782 (1968). Google Scholar

2. 

M. Lurie, “Fourier-transform holograms with partially coherent light: holographic measurement of spatial coherence,” J. Opt. Soc. Am., 58 614 –619 https://doi.org/10.1364/JOSA.58.000614 JOSAAH 0030-3941 (1968). Google Scholar

3. 

U. Schnars and W. Jüptner, “Direct recording of holograms by a CCD target and numerical reconstruction,” Appl. Opt., 33 179 –181 https://doi.org/10.1364/AO.33.000179 APOPAI 0003-6935 (1994). Google Scholar

4. 

R. Horisaki et al., “Compressive propagation with coherence,” Opt. Lett., 47 613 –616 https://doi.org/10.1364/OL.444772 OPLEDP 0146-9592 (2022). Google Scholar

5. 

D. Blinder et al., “Signal processing challenges for digital holographic video display systems,” Signal Process. Image Commun., 70 114 –130 https://doi.org/10.1016/j.image.2018.09.014 SPICEF 0923-5965 (2019). Google Scholar

6. 

H. Ko and H. Y. Kim, “Deep learning-based compression for phase-only hologram,” IEEE Access, 9 79735 –79751 https://doi.org/10.1109/ACCESS.2021.3084800 (2021). Google Scholar

7. 

L. Shi et al., “Towards real-time photorealistic 3D holography with deep neural networks,” Nature, 591 234 –239 https://doi.org/10.1038/s41586-020-03152-0 (2021). Google Scholar

8. 

C. Lee et al., “Deep learning based on parameterized physical forward model for adaptive holographic imaging with unpaired data,” Nat. Mach. Intell., 5 35 –45 https://doi.org/10.1038/s42256-022-00584-3 (2023). Google Scholar

9. 

X. Guo et al., “Stokes meta-hologram toward optical cryptography,” Nat. Commun., 13 6687 https://doi.org/10.1038/s41467-022-34542-9 NCAOBW 2041-1723 (2022). Google Scholar

10. 

H. Yang et al., “Angular momentum holography via a minimalist metasurface for optical nested encryption,” Light Sci. Appl., 12 79 https://doi.org/10.1038/s41377-023-01125-2 (2023). Google Scholar

11. 

R. Fiolka, K. Si and M. Cui, “Complex wavefront corrections for deep tissue focusing using low coherence backscattered light,” Opt. Express, 20 16532 –16543 https://doi.org/10.1364/OE.20.016532 OPEXFF 1094-4087 (2012). Google Scholar

12. 

S. Lim et al., “Optimal spatial coherence of a light-emitting diode in a digital holographic display,” Appl. Sci., 12 4176 https://doi.org/10.3390/app12094176 (2022). Google Scholar

13. 

Y. Deng and D. Chu, “Coherence properties of different light sources and their effect on the image sharpness and speckle of holographic displays,” Sci. Rep., 7 5893 https://doi.org/10.1038/s41598-017-06215-x SRCEC3 2045-2322 (2017). Google Scholar

14. 

X. Tong et al., “A deep-learning approach for low-spatial-coherence imaging in computer-generated holography,” Adv. Photonics Res., 4 2200264 https://doi.org/10.1002/adpr.202200264 (2023). Google Scholar

15. 

Y. Peng et al., “Speckle-free holography with partially coherent light sources and camera-in-the-loop calibration,” Sci. Adv., 7 5040 https://doi.org/10.1126/sciadv.abg5040 STAMCV 1468-6996 (2021). Google Scholar

16. 

F. Wang et al., “Propagation of coherence-OAM matrix of an optical beam in vacuum and turbulence,” Opt. Express, 31 20796 –20811 https://doi.org/10.1364/OE.489324 OPEXFF 1094-4087 (2023). Google Scholar

17. 

D. Jin et al., “Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning,” Nat. Mach. Intell., 3 876 –884 https://doi.org/10.1038/s42256-021-00392-1 (2021). Google Scholar

18. 

Q. Zhang et al., “Effect of oceanic turbulence on the visibility of underwater ghost imaging,” J. Opt. Soc. Am. A, 36 397 –402 https://doi.org/10.1364/JOSAA.36.000397 JOAOD6 0740-3232 (2019). Google Scholar

19. 

K. Wang et al., “Deep learning wavefront sensing and aberration correction in atmospheric turbulence,” PhotoniX, 2 8 https://doi.org/10.1186/s43074-021-00030-4 (2021). Google Scholar

20. 

Y. Chen et al., “A wavelet based deep learning method for underwater image super resolution reconstruction,” IEEE Access, 8 117759 –117769 https://doi.org/10.1109/ACCESS.2020.3004141 (2020). Google Scholar

21. 

L. Zhang et al., “Restoration of single pixel imaging in atmospheric turbulence by Fourier filter and CGAN,” Appl. Phys. B, 127 45 https://doi.org/10.1007/s00340-021-07596-8 (2021). Google Scholar

22. 

Y. Baykal, Y. Ata and M. C. Gökçe, “Underwater turbulence, its effects on optical wireless communication and imaging: a review,” Opt. Laser Technol., 156 108624 https://doi.org/10.1016/j.optlastec.2022.108624 OLTCAS 0030-3992 (2022). Google Scholar

23. 

J. Bertolotti and O. Katz, “Imaging in complex media,” Nat. Phys., 18 1008 –1017 https://doi.org/10.1038/s41567-022-01723-8 NPAHAX 1745-2473 (2022). Google Scholar

24. 

T. Zeng, Y. Zhu and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express, 29 40572 –40593 https://doi.org/10.1364/OE.443367 OPEXFF 1094-4087 (2021). Google Scholar

25. 

A. Khan et al., “GAN-Holo: generative adversarial networks-based generated holography using deep learning,” Complexity, 2021 6662161 https://doi.org/10.1155/2021/6662161 COMPFS 1076-2787 (2021). Google Scholar

26. 

M. Liao et al., “Scattering imaging as a noise removal in digital holography by using deep learning,” New J. Phys., 24 083014 https://doi.org/10.1088/1367-2630/ac8308 (2022). Google Scholar

27. 

T. Shimobaba et al., “Deep-learning computational holography: a review,” Front. Photonics, 3 854391 https://doi.org/10.3389/fphot.2022.854391 (2022). Google Scholar

28. 

Y. Rivenson, Y. Wu and A. Ozcan, “Deep learning in holography and coherent imaging,” Light Sci. Appl., 8 85 https://doi.org/10.1038/s41377-019-0196-0 (2019). Google Scholar

29. 

Z. Chen et al., “Physics-driven deep learning enables temporal compressive coherent diffraction imaging,” Optica, 9 677 https://doi.org/10.1364/OPTICA.454582 (2022). Google Scholar

30. 

Y. Jo et al., “Holographic deep learning for rapid optical screening of anthrax spores,” Sci. Adv., 3 e1700606 https://doi.org/10.1126/sciadv.1700606 STAMCV 1468-6996 (2023). Google Scholar

31. 

K. He et al., “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. and Pattern Recognit. (CVPR), 770 –778 (2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar

32. 

Z. Liu et al., “Swin transformer: hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 10012 –10022 (2021). https://doi.org/10.1109/ICCV48922.2021.00986 Google Scholar

33. 

V. V. Nikishov, “Spectrum of turbulent fluctuations of the sea-water refraction index,” Int. J. Fluid Mech. Res., 27 82 –98 https://doi.org/10.1615/InterJFluidMechRes.v27.i1.70 (2000). Google Scholar

34. 

B. E. Stribling, B. M. Welsh and M. C. Roggemann, “Optical propagation in non-Kolmogorov atmospheric turbulence,” Proc. SPIE, 2471 181 –195 https://doi.org/10.1117/12.211927 PSISDG 0277-786X (1995). Google Scholar

35. 

R. W. Gerchberg, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik, 35 237 –246 OTIKAJ 0030-4026 (1972). Google Scholar

36. 

D. Martin et al., “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. Eighth IEEE Int. Conf. Comput. Vis., 416 –423 (2001). https://doi.org/10.1109/ICCV.2001.937655 Google Scholar

37. 

Z. Liu et al., “Deep learning face attributes in the wild,” in IEEE Int. Conf. Comput. Vis., 3730 –3738 (2015). https://doi.org/10.1109/ICCV.2015.425 Google Scholar

38. 

P. Young et al., “From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions,” Trans. Assoc. Comput. Linguist., 2 67 –78 https://doi.org/10.1162/tacl_a_00166 (2014). Google Scholar

39. 

W. Li et al., “WebVision database: visual learning and understanding from web data,” (2017). Google Scholar

40. 

E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: dataset and study,” in IEEE Conf. Comput. Vis. and Pattern Recognit. Workshops (CVPRW), 1122 –1131 (2017). https://doi.org/10.1109/CVPRW.2017.150 Google Scholar

41. 

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” (2017). Google Scholar

42. 

O. Ronneberger, P. Fischer and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” Lect. Notes Comput. Sci., 9351 234 –241 https://doi.org/10.1007/978-3-319-24574-4_28 LNCSD9 0302-9743 (2015). Google Scholar

43. 

A. Kirillov et al., “Segment anything,” (2023). Google Scholar

44. 

F. Gori and M. Santarsiero, “Devising genuine spatial correlation functions,” Opt. Lett., 32 3531 –3533 https://doi.org/10.1364/OL.32.003531 OPLEDP 0146-9592 (2007). Google Scholar

Biography

Xin Tong is a PhD student at the School of Physics, Zhejiang University, Hangzhou, China. He received his BS degree in physics from Zhejiang University of Science and Technology, Hangzhou, China. His current research interests include holographic imaging, deep learning, computational imaging, and partial coherence theory.

Renjun Xu received his PhD from the University of California, Davis, California, United States. He is a ZJU100 Young Professor and a PhD supervisor at the Center for Data Science, Zhejiang University, Hangzhou, China. He was the senior director of data and artificial intelligence at VISA Inc. His research interests include machine learning, alignment techniques for large-scale pretrained models, transfer learning, space editing, transformation, generation, and the interdisciplinarity of physics and mathematics.

Pengfei Xu is a PhD student at the School of Physics, Zhejiang University, Hangzhou, China. He received his BS degree in physics from Zhejiang University, Hangzhou, China, in 2017. His current research interests include computational holographic imaging, partially coherent structured light field, and vortex beam manipulation techniques.

Zishuai Zeng is a PhD student at the School of Physics, Zhejiang University, Hangzhou, China. He received his BS degree in 2019 from the School of Information Optoelectronic Science and Engineering at South China Normal University. His current research interests include computer-generated holography, as well as beam propagation transformation and computational imaging.

Shuxi Liu is a PhD student at the School of Physics, Zhejiang University, China. He received his BS degree in physics from Zhejiang University in 2022. His current research interests include catastrophe optics, optical vortex, and computational imaging.

Daomu Zhao received his PhD from Zhejiang University, Hangzhou, China. Since 2003, he has been as a professor of the School of Physics at Zhejiang University. Now, he is the director of the Institute of Optoelectronic Physics at Zhejiang University. He has broad research interests in beam transmission, coherence and polarization theory, diffraction optics, holographic imaging, and deep learning.

CC BY: © The Authors. Published by SPIE and CLP under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Xin Tong, Renjun Xu, Pengfei Xu, Zishuai Zeng, Shuxi Liu, and Daomu Zhao "Harnessing the magic of light: spatial coherence instructed swin transformer for universal holographic imaging," Advanced Photonics 5(6), 066003 (25 October 2023). https://doi.org/10.1117/1.AP.5.6.066003
Received: 10 July 2023; Accepted: 26 September 2023; Published: 25 October 2023
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Education and training

Holography

Turbulence

Image restoration

Data modeling

Transformers

Atmospheric turbulence

Back to Top