Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks

As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are non-negative, acting on diffraction-limited optical intensity patterns at the input field-of-view (FOV). Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.


Introduction
The recent resurgence of analog optical information processing has been spurred by advancements in artificial intelligence (AI), especially deep learning-based methods .These advances in data-driven learning methods have also benefitted optical hardware engineering, giving rise to new computing architectures such as Diffractive Deep Neural Networks (D 2 NN), which exploit the passive interaction of light with spatially engineered surfaces to perform visual information processing.D 2 NNs, also referred to as diffractive optical networks, diffractive networks or diffractive processors, have emerged as powerful all-optical processors [9,10] capable of completing various visual computing tasks at the speed of light propagation through thin passive optical devices; examples of such tasks include image classification [11][12][13], information encryption [14,15], quantitative phase imaging (QPI) [16,17], among others [18][19][20][21][22]. Diffractive optical networks comprise a set of spatially engineered surfaces, the transmission (and/or reflection) profiles of which are optimized using machine learning techniques.After their digital optimization, a one-time effort, these diffractive surfaces are fabricated and assembled in 3D to form an all-optical visual processor, which axially extends at most a few hundred wavelengths ().
Our earlier work [10,23] demonstrated that a spatially coherent D 2 NN can perform arbitrary complexvalued linear transformations between a pair of arbitrary input and output apertures if its design has a sufficient number () of diffractive features that are optimized, i.e.,  ≥     where   and   represent the space-bandwidth product of the input and output apertures, respectively.In other words,   and   represent the size of the desired complex-valued linear transformation  ∈ ℂ   ×  that can be all-optically performed by an optimized D 2 NN.For a phase-only diffractive network, i.e., only the phase profile of each diffractive layer is trainable, the sufficient condition becomes  ≥ 2    due to the reduced degrees-of-freedom within the diffractive volume.Similar conclusions can be reached for a diffractive network that operates under spatially incoherent illumination: Rahman et al. [24] demonstrated that a diffractive network can be optimized to perform an arbitrary non-negative linear transformation of optical intensity through phase-only diffractive processors with  ≥ 2    .Other optical approaches were also developed to process complex-valued input data with spatially incoherent light [1,[25][26][27]; however, these earlier systems are limited to one-dimensional (1D) optical inputs and do not cover arbitrary input and output apertures, limiting their functionality and processing throughput.An extension of these earlier 1D input approaches introduced the processing of 2D incoherent sourcearrays using relatively bulky and demanding optical projection systems that are hard to operate at the diffraction limit of light [28,29].
Here, we demonstrate the processing of complex-valued data with compact diffractive optical networks under spatially incoherent illumination.We show that a spatially incoherent diffractive network that axially spans <100 can perform any arbitrary complex-valued linear transformation on complexvalued input data with negligible error if the number of optimizable diffractive features is above a threshold dictated by the multiplication of the input and output space-bandwidth products, determined by both the spatial extent and the pixel size of the input and output apertures.To represent complexvalued spatial information using spatially incoherent illumination, we preprocessed the input information by mapping complex-valued data to a real and non-negative, optical intensity-based representation at the input field-of-view (FOV) of the diffractive network.We term this mapping the 'mosaicing' operation, indicating the utilization of multiple intensity pixels at the input FOV to represent one complex-valued input data point.Similarly, we used a postprocessing step, which involved mapping the output FOV intensity patterns back to the complex number domain, which we termed the 'demosaicing' operation.Through these mosaicing/demosaicing operations, we show that a spatially incoherent D 2 NN can be optimized to perform an arbitrary complex-valued linear transformation between its input and output apertures while providing optical information encryption.The presented spatially incoherent visual information processor, with its universality and thin form factor (<100), shows significant promise for image encryption and computational imaging applications under natural light.

Results
Figure 1a outlines a spatially incoherent D 2 NN architecture to synthesize an arbitrary complex-valued linear transformation () such that  = , where the input is  ∈ ℂ   , the target is  ∈ ℂ   and  ∈ ℂ   ×  .The mosaicing process involves finding the non-negative (optical intensity-based) representation of each complex-valued element of  using  non-negative values; here,  bases,   ,  = 0, ⋯ ,  − 1 (see Figure 1c), are used for representing the intensity-based encoding of complex numbers.Based on this representation, the 2D input aperture of a spatially incoherent D 2 NN will have   non-negative (optical intensity) values, denoted as   ∈ ℝ +   , representing the input information under spatially incoherent illumination.The output intensity distribution, denoted with   ̂∈ ℝ +   , undergoes a demosaicing process where a complex number is synthesized from the intensity values of  output pixels, yielding the complex output vector  ̂∈ ℂ   such that  ̂≈ .
In our analyses, we used  = 3, except in Supplementary Figure .2, where  = 4 results are shown for comparison.We chose the basis complex numbers as   = exp ( 2 ) ,  = 0, ⋯ ,  − 1 such that the set of bases  is closed under multiplication, and the product of any two of the bases in the set is also a basis; for example, for  = 3 we have     =  (+ mod 3) .Based on this representation of information, with  = 3 and  0 ,  1 ,  2 , we can decompose any arbitrarily selected complex valued transformation matrix  into  = 3 matrices (  ,   ,   ) with real non-negative entries such that: For a given complex-valued input  =  0   +  1   +  2   , where   ∈ ℝ + , the corresponding target output vector can be written as: i.e., we have: with a non-negative real-valued matrix   : For  = 4, where     =  (+ mod 4) and  =  0   +  1   +  2   +  3   , a similar analysis yields: Based on these equations, one can conclude that to all-optically implement an arbitrary complex-valued transformation  =  using a spatially incoherent D 2 NN, the layers of the D 2 NN need to be optimized to perform an intensity linear transformation   ∈ ℝ +  2     such that   =     .The entire system, upon convergence, performs the predefined complex-valued linear transformation  on any given input data using spatially incoherent light -based on Eqs. 2 and 4. In the following sections, we numerically explore the number of optimizable diffractive features () needed for accurate approximation of  using a spatially incoherent D 2 NN.

Complex-valued linear transformations through spatially incoherent diffractive networks
We numerically demonstrated the capabilities of diffractive optical processors to universally perform any arbitrarily chosen complex-valued linear transformation with spatially incoherent light.Throughout the paper, we used   =   = 16.To visually represent the data, we rearranged the 16-element vectors into 4 × 4 arrays of complex numbers, hereafter referred to as the "complex image."We arbitrarily selected a desired complex-valued transformation,  ∈ ℂ × , as shown in Figure 1(b).
To explore the number of diffractive features needed, we trained nine models with varying values of  and evaluated the mean-squared-error (MSE) between the numerically measured (  ̂) and the target all-optical linear transformation,   (see Fig. 2).Our results summarized in Figure 2 highlight that with a sufficient number of optimizable diffractive features, i.e.,  ≥ 2 2     = 2 ,  , our system achieves a negligible approximation error with respect to the target   ∈ ℝ + × .In Fig. 2c, we also visualize the resulting all-optical intensity transformation   ̂ compared to the ground truth   .In essence, this comparison reveals the spatially varying incoherent point-spread-functions (PSF) of our diffractive system optimized using deep learning; a negligible MSE between   ̂ and   shows that the resulting spatially varying incoherent PSFs match the target set of PSFs dictated by   .
We also evaluated the numerical accuracy of our complex-valued transformation in an end-to-end manner, as illustrated in Fig. 2d.For this numerical test, we sequentially set each entry of  to  0 and evaluated the corresponding complex output  ̂ and stacked them to form   ̂, where the subscript represents that the measurement was evaluated using the complex impulse along the basis  0 as input.Then, we repeated this process for the other two bases to obtain   ̂ and   ̂, and stacked these matrices Complex number-based image encryption using spatially incoherent diffractive networks.
In this section, we demonstrate a complex number-based image encryption-decryption scheme using a spatially incoherent D 2 NN.In the first scheme shown in Figure 1d, the message is encoded into a complex image, employing either amplitude and phase encoding or real and imaginary part encoding.Then, a digital lock encrypts the image by applying a linear transformation ( − ) to conceal the original message within the image.At the optical receiver, the encrypted message is deciphered by an optimized incoherent D 2 NN that all-optically implements the inverse transformation .In an alternative scheme, as depicted in Figure 1e, the key and lock are switched, i.e., the spatially incoherent D 2 NN is used to encrypt the message with a complex-valued  while the decryption step involves the digital inversion using  − .
For our analysis, we used the letters 'U', 'C', 'L', 'A' as sample messages.'U' and 'C' are used in amplitude-phase based encoding (Figure 3), whereas 'L' and 'A' are used for real-imaginary based encoding of information (Supplementary Figure S1), forming complex-number-based images.To accurately model the spatially incoherent propagation [24] of light through the D 2 NN, we averaged the output intensities over a large number   = 20,000 of randomly generated 2D phase profiles at the input (see Methods section for details).
In Figure 3(a), we show the results corresponding to digital encryption and optical diffractive decryption, i.e., the system shown in Figure 1d.The digitally encrypted complex information  =  − , together with its intensity representation   , are shown in Fig. 3(a).The optically decrypted output  ̂ (through the spatially incoherent D 2 NN) and its intensity-based representation   ̂ are shown in the same figure, together with the resulting error maps, i.e., | ̂− |  and |  ̂−   |  , which reveal a very small amount of error.This agreement of the recovered and the ground truth messages in both the intensity and complex-valued domains confirms the accuracy of the diffractive decryption process through an optimized spatially incoherent D 2 NN. Figure 3(b) shows the successful performance of the sister scheme (Figure 1e), which involves diffractive encryption through a spatially incoherent D 2 NN and digital decryption, also revealing a negligible amount of error in both | −  ̂− |  and |  ̂−   |  .Reported in Supplementary Figure S1, we also conducted a numerical experiment using the letters 'L' and 'A', encoded using the real and imaginary parts of the message.The visualizations are arranged the same way as in Figure 3, where for both schemes depicted in Figure 1d,e, the amount of error between the recovered and the original messages is negligible, affirming the success of using the real and imaginary part-based encoding method.

Different mosaicing and demosaicing schemes in a spatially incoherent D 2 NN
How we assign each element in the vector   and   to the pixels at the input and output FOVs of the diffractive network does not affect the final accuracy of the image/message reconstruction.For example, we can arrange the FOVs in such a manner that the components  , corresponding to a basis   are assigned to the neighboring pixels, in two adjacent rows, as shown in Supplementary Figure S2a; in an alternative implementation, the assignment/mapping can be completely arbitrary, which is equivalent to applying a random permutation operation on the input and output vectors (see the Methods section).When compared to each other, these two approaches of mosaicing and demosaicing schemes show negligible differences in the error of the final reconstruction of the letters 'U', 'C', 'L', 'A' as shown in Supplementary Figure S2b.These results underscore that the specific arrangement of the mosaicking/demosaicing schemes at the input and output FOVs does not impact the performance of the incoherent D 2 NN system.

Discussion
In this manuscript, we employed a data-free PSF-based D 2 NN optimization method (see the 'Methods' section) [24] since we can determine the non-negative intensity transformation   from the target complex-valued transformation  based on the mosaicing and demosaicing schemes; the columns of   represent the desired spatially varying PSFs of the D 2 NN.The advantage of this data-free learning-based D 2 NN optimization approach is that computationally demanding simulation of wave propagation with large   is not required during the training, i.e.,   = 1 is sufficient for simulating the spatially varying PSFs; hence the training time is much shorter.On the other hand, this approach necessitates prior knowledge of   , which might not always be available, e.g., for tasks such as data classification.An alternative to this data-free PSF-based optimization approach is to train the diffractive network in an end-to-end manner, using a data-driven direct training approach [24].This strategy advances by minimizing the differences between the outputs and the targets on a large number of randomly generated examples, thereby learning the spatially varying PSFs implicitly from numerous input-target intensity patterns corresponding to the desired task -instead of learning from an explicitly predetermined   .This direct approach, however, requires a longer training time, necessitating the simulation of incoherent propagation for each training sample on a large dataset.
In our presented approach, the choice of  is not restricted to  = 3, as we have used throughout the main text.As another example of encoding, we show the image encryption results with  = 4 in Supplementary Fig. S3, where the four bases are exp   2  ( = 0,1,2,3).The reconstructed 'U', 'C', 'L', 'A' letters are also reported in the same figure, confirming that given sufficient degrees-of-freedom (with  ≥ 2 2     ), the linear transformation performances are similar to each other.However, compared to  = 3, this choice of  = 4 necessitates 4/3 times more pixels on both the diffractive network input and output FOVs -reducing the throughput (or spatial density) of complex-valued linear transformations that can be performed using a spatially incoherent D 2 NN.Accordingly, more diffractive features and a larger number of independent degrees of freedom (by 16/9-fold) are required within the D 2 NN volume to achieve an output performance level that is comparable to a design with  = 3.
Our framework offers several flexibilities in implementation, which could be useful for different applications.First, the flexibility to arbitrarily permute the input and the output pixels following different mosaicing and demosaicing schemes (as introduced earlier in the Results section) could enhance the security of optical information transmission.A user would not be able to either spam or hack valuable information that is transferred optically without specific knowledge of the mosaicing and demosaicing schemes, thus ensuring the security of this scheme.Second, the flexibility in choosing , as discussed above, could be useful in adding an extra layer of security against unauthorized access.Furthermore, we can use different sets of bases for mosaicing and demosaicing operations by applying offset phase angles   and   , respectively, to the original bases   = exp ( Regarding image encryption-related applications, we demonstrated two approaches (Figures 1 d-e) to utilize D 2 NNs for encryption or decryption.However, it is also possible to deploy a pair of diffractive systems in tandem, with one undertaking the matrix operation  for encryption and the other undertaking the inverse operation  − for decryption.Furthermore potential extensions of our work could explore a harmonized integration of polarization state controls [30] and wavelength multiplexing [31] to build a multi-faceted, fortified encryption platform.In addition to increasing the data throughput, these additional degrees of freedom enabled by different illumination wavelengths and polarization states would further enhance the security of a diffractive processor-based system.
To conclude, we demonstrated the capability of spatially incoherent diffractive networks to perform arbitrary complex-valued linear transformations.By incorporating various forms of mosaicing and demosaicing operations, we paved the way for a wider array of applications by leveraging incoherent D 2 NNs for complex-valued data processing.We also showcased potential applications of these spatially incoherent D 2 NNs for complex number-based image encryption or decryption, highlighting the security benefits arising from the system's flexibility.Our exploration marks a significant stride toward enhanced versatility and robustness in optical information processing with spatially incoherent diffractive systems that can work under natural light.

Linear transformation matrix
In this paper, we use   =   = 16 so that  ∈ ℂ 16×16 ; see Figure 1b.To generate , we randomly sample the amplitude of each element from the uniform distribution (0,1) and the phases from (0,2).For the encryption application, to ensure that the result of inversion is not sensitive to small errors, we performed QR-factorization on  to obtain a condition number of one [32]. .The desired all-optical intensity transformation   between   and   is derived from the target complex-valued linear transformation  following Eqs. 1 and 5.We should note that deriving   from  requires mapping each complex element  to its real and non-negative representation ( 0 , ⋯ ,  −1 ) based on the  ≥ 3 complex bases   such that  = ∑     −1 =0

Real-valued non-negative representation of complex numbers
. To define a unique mapping, we imposed additional constraints:   = 0 if ; here  * represents the complex conjugate of .The same constraints were also used while mapping the complex input vectors  to the real and non-negative intensity vectors   .

Mosaicing and demosaicing schemes
For mosaicing (demosaicing) assignment of each element of   (  ) to one of the  , =   ( , =   ) pixels of the 2D input (output), the arrangement of the FOV can be regular, e.g., in a row-major order as shown in Supplementary Figure S2 We used   = 20,000 for estimating the incoherent output intensity (, ) corresponding to any arbitrary input intensity (, ).However, for evaluating the spatially incoherent PSFs of the D 2 NN,   = 1 is sufficient.

Coherent propagation of optical fields: 𝔇{•}
The propagation of spatially coherent light patterns through a diffractive processor, denoted by {•}, involves a series of interactions with consecutive diffractive surfaces, interleaved by wave propagation through the free-space separating these surfaces.We assume that these modulations are introduced by phase-only diffractive surfaces, i.e., the field amplitude remains unchanged during the light-matter interaction.Specifically, we assume that a diffractive surface alters the incident optical field, symbolized as (, ), in a localized manner according to the optimized phase values   (, ) of the diffractive features, resulting in the phase-modulated field (, ) exp(  (, )).The diffractive surfaces are coupled by free-space propagation, allowing the light to travel from one surface to the next.We used the angular spectrum method to simulate the free-space propagation [33]: (, ;  =  0 + ) = ℱ −1 {ℱ{(, ;  =  0 )} × (  ,   ; )} (9)

Figures and captions
Figure 2d represents one of these complex output vectors, while the corresponding target vectors are presented in the same figure through amp() and phase().The small magnitude of the error  = | ̂− |  shown in Fig.2dillustrates the success of this spatially incoherent D 2 NN model in accurately approximating the complex-valued linear transformation  = , implemented for an arbitrarily selected .

Figure
Figure 1.(a) Complex-valued universal linear transformations using spatially incoherent diffractive optical networks.(b) Amplitude and phase of the target complex-valued linear transformation.(c) Mosaicing and demosaicing processes.(d-e)Image encryption.In (d), complex-valued images are digitally encrypted ( − ), and subsequently decrypted using the diffractive system that performs  (diffractive key).For (e), the encryption is performed through the spatially incoherent diffractive network (diffractive lock) and the decryption is performed digitally (digital key).

1 .
Figure 1.(a) Complex-valued universal linear transformations using spatially incoherent diffractive optical networks.(b) Amplitude and phase of the target complex-valued linear transformation.(c) Mosaicing and demosaicing processes.(d-e)Image encryption.In (d), complex-valued images are digitally encrypted ( − ), and subsequently decrypted using the diffractive system that performs  (diffractive key).For (e), the encryption is performed through the spatially incoherent diffractive network (diffractive lock) and the decryption is performed digitally (digital key).

Figure 2 .
Figure 2. Performance of spatially incoherent diffractive networks on arbitrary complex-valued linear transformations.(a) All-optical linear transformation error as a function of the number of diffractive features ().The red dot represents the design corresponding to the results shown in b-d.(b) The phase profiles of the K=4 diffractive layers of the optimized model ( = 2 × 2 ,  , ).(c) Evaluation of the resulting all-optical intensity transformation, i.e., the spatially varying PSFs.(d) The complex linear transformation evaluation.For   and , |•| 2 represents an elementwise operation.

Figure 3 .
Figure 3. Image encryption with the letters 'U' and 'C' encoded into amplitude and phase, respectively, of the complex-valued image.(a) The input, target, output, and the approximation error, both in complex and real non-negative (intensity) domains.The original information is represented by  while  is obtained by digital encrypting  following Figure 1d.(b) The input, output (resulting from optical encryption) and digitally decrypted output and the error between the input and the decrypted output.The result of digital decryption matches the input information.The second row shows the corresponding input, target and output intensities and the approximation error.|•| 2 represents an elementwise operation.
=  0 (    +     +     ) +  1 (    +     +     ) +  2 (    +     +     ) (a), 'Regular mosaicing'.Alternatively, the pixel assignment on the input (output) FOV can follow any arbitrary mapping which can be defined by a permutation matrix   (  ) operating on the input (output) vector; see Supplementary FigureS2 (a), 'Arbitrary mosaicing'.For such cases, when ordered in a row-major format, intensities on the input (output) FOVs   (  ) can be written as   =   [   ⋯  −  ]  (  =   [   ⋯  −  ]  ).Accordingly, such an arbitrary arrangement of pixels was accounted for by redefining the all-optical intensity transformation as        .The 1D vector   is rearranged as a 2D distribution of intensity (, ) at the input FOV of the D 2 NN.To numerically model the spatially incoherent propagation of the input intensity distribution (, ) through the D 2 NN , we coherently propagated the optical field √exp () through the trainable diffractive surfaces to the output plane, where  is a random 2D phase distribution, i.e., (, )~(0, 2) for all (, ).If we denote the coherent field propagation operator as {•} (see the next subsection), then the instantaneous output intensity is |{√(, ) exp((, ))}| The average output intensity can be approximately calculated by repeating the coherent wave propagation {•}   -times, each time with a different random phase distribution   (, ), and averaging the resulting   output intensities: