Open Access
19 January 2024 Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks
Author Affiliations +
Abstract

As an optical processor, a diffractive deep neural network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees of freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are nonnegative, acting on diffraction-limited optical intensity patterns at the input field of view. Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.

1.

Introduction

The recent resurgence of analog optical information processing has been spurred by advancements in artificial intelligence (AI), especially deep-learning-based inference methods.19 These advances in data-driven learning methods have also benefited optical hardware engineering, giving rise to new computing architectures such as diffractive deep neural networks (D2NN), which exploit the passive interaction of light with spatially engineered surfaces to perform visual information processing. D2NNs, also referred to as diffractive optical networks, diffractive networks, or diffractive processors, have emerged as powerful all-optical processors9,10 capable of completing various visual computing tasks at the speed of light propagation through thin passive optical devices; examples of such tasks include image classification,1113 information encryption,1417 and quantitative phase imaging (QPI),18,19 among others.2024 Diffractive optical networks comprise a set of spatially engineered surfaces, the transmission (and/or reflection) profiles of which are optimized using machine-learning techniques. After their digital optimization (a one-time effort), these diffractive surfaces are fabricated and assembled in 3D to form an all-optical visual processor, which axially extends at most a few hundred wavelengths (λ).

Our earlier work10,25 demonstrated that a spatially coherent D2NN can perform arbitrary complex-valued linear transformations between a pair of arbitrary input and output apertures if its design has a sufficient number (N) of diffractive features that are optimized, i.e., NNiNo, where Ni and No represent the space-bandwidth product of the input and output apertures, respectively. In other words, Ni and No represent the size of the desired complex-valued linear transformation ACNo×Ni that can be all-optically performed by an optimized D2NN. For a phase-only diffractive network, i.e., only the phase profile of each diffractive layer is trainable, the sufficient condition becomes N2NiNo due to the reduced degrees of freedom within the diffractive volume. Similar conclusions can be reached for a diffractive network that operates under spatially incoherent illumination: Rahman et al.26 demonstrated that a diffractive network can be optimized to perform an arbitrary nonnegative linear transformation of optical intensity through phase-only diffractive processors with N2NiNo. However, encoding information with spatially incoherent light inherently confines both the input and output to nonnegative values, as they are represented by intensity patterns at the input and output apertures of a D2NN. To process complex-valued data with spatially incoherent light, other optical approaches were also developed;1,2729 however, these earlier systems are limited to one-dimensional (1D) optical inputs and do not cover arbitrary input and output apertures, limiting their functionality and processing throughput. An extension of these earlier 1D input approaches introduced the processing of 2D incoherent source arrays using relatively bulky and demanding optical projection systems that are hard to operate at the diffraction limit of light.30,31

Here, we demonstrate the processing of complex-valued data with compact diffractive optical networks under spatially incoherent illumination. We show that a spatially incoherent diffractive network that axially spans <100×λ can perform any arbitrary complex-valued linear transformation on complex-valued input data with negligible error if the number of optimizable diffractive features is above a threshold dictated by the multiplication of the input and output space-bandwidth products, determined by both the spatial extent and the pixel size of the input and output apertures. To represent complex-valued spatial information using spatially incoherent illumination, we preprocessed the input information by mapping complex-valued data to a real and nonnegative, optical intensity-based representation at the input field of view (FOV) of the diffractive network. We term this mapping the “mosaicking” operation, indicating the utilization of multiple intensity pixels at the input FOV to represent one complex-valued input data point. Similarly, we used a postprocessing step, which involved mapping the output FOV intensity patterns back to the complex number domain, which we termed the “demosaicking” operation. Through these mosaicking/demosaicking operations, we show that a spatially incoherent D2NN can be optimized to perform an arbitrary complex-valued linear transformation between its input and output apertures while providing optical information encryption. The presented spatially incoherent visual information processor, with its universality and thin form factor (<100×λ), shows significant promise for image encryption and computational imaging applications under natural light.

2.

Results

Figure 1(a) outlines a spatially incoherent D2NN architecture to synthesize an arbitrary complex-valued linear transformation (A) such that o=Ai, where the input is iCNi, the target is oCNo and ACNo×Ni. The mosaicking process involves finding the nonnegative (optical intensity-based) representation of each complex-valued element of i using E nonnegative values; here, E bases, ek, k=0,,E1 [see Fig. 1(c)], are used for representing the intensity-based encoding of complex numbers. Based on this representation, the 2D input aperture of a spatially incoherent D2NN will have ENi nonnegative (optical intensity) values, denoted as irR+ENi, representing the input information under spatially incoherent illumination. The output intensity distribution, denoted with or^R+ENo, undergoes a demosaicking process where a complex number is synthesized from the intensity values of E output pixels, yielding the complex output vector o^CNo such that o^Ai.

Fig. 1

(a) Complex-valued universal linear transformations using spatially incoherent diffractive optical networks. (b) Amplitude and phase of the target complex-valued linear transformation. (c) Mosaicking and demosaicking processes. (d)–(e) Image encryption. (d) Complex-valued images are digitally encrypted (A1), and subsequently decrypted using the diffractive system that performs A (diffractive key). (e) The encryption is performed through the spatially incoherent diffractive network (diffractive lock), and the decryption is performed digitally (digital key).

APN_3_1_016010_f001.png

In our analyses, we used E=3, except in Fig. S5 in the Supplementary Material, where E=4 results are shown for comparison. We chose the basis complex numbers as ek=exp(jk2πE), k=0,,E1 such that the set of bases S is closed under multiplication, and the product of any two of the bases in the set is also a basis; for example, for E=3 we have ekel=e(k+lmod3). Based on this representation of information, with E=3 and e0, e1,e2, we can decompose any arbitrarily selected complex-valued transformation matrix A into E=3 matrices (A0, A1, A2) with real nonnegative entries such that

Eq. (1)

A=e0A0+e1A1+e2A2.

For a given complex-valued input i=e0i0+e1i1+e2i2, where ikR+, the corresponding target output vector can be written as

Eq. (2)

o=Ai=(e0A0+e1A1+e2A2)(e0i0+e1i1+e2i2),

Eq. (3)

o=e0(A0i0+A2i1+A1i2)+e1(A1i0+A0i1+A2i2)+e2(A2i0+A1i1+A0i2),
i.e., we have

Eq. (4)

or=[o0o1o2]=[A0A2A1A1A0A2A2A1A0][i0i1i2]=Arir,
with a nonnegative real-valued matrix Ar

Eq. (5)

Ar=[A0A2A1A1A0A2A2A1A0].

For E=4, where ekel=e(k+lmod4) and A=e0A0+e1A1+e2A2+e3A3, a similar analysis yields

Eq. (6)

Ar=[A0A2A3A1A2A0A1A3A1A3A0A2A3A1A2A0].

Based on these equations, one can conclude that to all-optically implement an arbitrary complex-valued transformation o=Ai using a spatially incoherent D2NN, the layers of the D2NN need to be optimized to perform an intensity linear transformation ArR+E2NiNo such that or=Arir. The entire system, upon convergence, performs the predefined complex-valued linear transformation A on any given input data using spatially incoherent light, based on Eqs. (2) and (4). In the following sections, we numerically explore the number of optimizable diffractive features (N) needed for accurate approximation of A using a spatially incoherent D2NN.

2.1.

Complex-Valued Linear Transformations through Spatially Incoherent Diffractive Networks

We numerically demonstrated the capabilities of diffractive optical processors to universally perform any arbitrarily chosen complex-valued linear transformation with spatially incoherent light. Throughout the paper, we used Ni=No=16. To visually represent the data, we rearranged the 16-element vectors into 4×4 arrays of complex numbers, hereafter referred to as the “complex image.” We arbitrarily selected a desired complex-valued transformation, AC16×16, as shown in Fig. 1(b).

To explore the number of diffractive features needed, we trained nine models with varying values of N and evaluated the mean-squared-error (MSE) between the numerically measured (Ar^) and the target all-optical linear transformation, Ar (see Fig. 2). Our results, summarized in Fig. 2, highlight that with a sufficient number of optimizable diffractive features, i.e., N2E2NiNo=2Ni,rNo,r, our system achieves a negligible approximation error with respect to the target ArR+48×48. In Fig. 2(c), we also visualize the resulting all-optical intensity transformation Ar^ compared to the ground truth Ar. In essence, this comparison reveals the spatially varying incoherent point spread functions (PSFs) of our diffractive system optimized using deep learning; a negligible MSE between Ar^ and Ar shows that the resulting spatially varying incoherent PSFs match the target set of PSFs dictated by Ar.

Fig. 2

Performance of spatially incoherent diffractive networks on arbitrary complex-valued linear transformations. (a) The all-optical linear transformation error as a function of the number of diffractive features (N). The red dot represents the design corresponding to the results shown in (b)–(d). (b) The phase profiles of the K=4 diffractive layers of the optimized model (N=2×2Ni,rNo,r). (c) Evaluation of the resulting all-optical intensity transformation, i.e., the spatially varying PSFs. (d) The complex linear transformation evaluation. For εr and ε, |·|2 represents an element-wise operation.

APN_3_1_016010_f002.png

We also evaluated the numerical accuracy of our complex-valued transformation in an end-to-end manner, as illustrated in Fig. 2(d). For this numerical test, we sequentially set each entry of i to e0, evaluated the corresponding complex output o^, and stacked them to form A0^, where the subscript represents that the measurement was evaluated using the complex impulse along the basis e0 as input. Then, we repeated this process for the other two bases to obtain A1^ and A2^, and stacked these matrices as a block matrix [A0^|A1^|A2^], shown in Fig. 2(d). Each row of the images amp(o^) and phase(o^) in Fig. 2(d) represents one of these complex output vectors, while the corresponding target vectors are presented in the same figure through amp(o) and phase(o). The small magnitude of the error ε=|o^o|2 shown in Fig. 2(d) illustrates the success of this spatially incoherent D2NN model in accurately approximating the complex-valued linear transformation o=Ai, implemented for an arbitrarily selected A.

2.2.

Complex Number-based Image Encryption Using Spatially Incoherent Diffractive Networks

In this section, we demonstrate a complex number-based image encryption–decryption scheme using a spatially incoherent D2NN. In the first scheme, shown in Fig. 1(d), the message is encoded into a complex image, employing either amplitude and phase encoding or real and imaginary part encoding. Then, a digital lock encrypts the image by applying a linear transformation (A1) to conceal the original message within the image. At the optical receiver, the encrypted message is deciphered by an optimized incoherent D2NN that all-optically implements the inverse transformation A. In an alternative scheme, as depicted in Fig. 1(e), the key and lock are switched, i.e., the spatially incoherent D2NN is used to encrypt the message with a complex-valued A while the decryption step involves the digital inversion using A1.

For our analysis, we used the letters “U,” “C,” “L,” and “A” as sample messages. “U” and “C” are used in amplitude-phase-based encoding (Fig. 3), whereas “L” and “A” are used for real-imaginary-based encoding of information (Fig. S1 in the Supplementary Material), forming complex-number-based images. To accurately model the spatially incoherent propagation26 of light through the D2NN, we averaged the output intensities over a large number of Nφ=20,000 of randomly generated 2D phase profiles at the input (see Sec. 4 for details).

Fig. 3

Image encryption with the letters “U” and “C” encoded into amplitude and phase, respectively, of the complex-valued image. (a) The input, target, output, and the approximatn error, both in complex and real nonnegative (intensity) domains. The original information is represented by o, while i is obtained by digital encrypting o following Fig. 1(d). (b) The input, output (resulting from optical encryption), and digitally decrypted output and the error between the input and the decrypted output. The result of digital decryption matches the input information. The second row shows the corresponding input, target, and output intensities and the approximation error. |·|2 represents an element-wise operation.

APN_3_1_016010_f003.png

In Fig. 3(a), we show the results corresponding to digital encryption and optical diffractive decryption, i.e., the system shown in Fig. 1(d). The digitally encrypted complex information i=A1o, and its intensity representation ir, are shown in Fig. 3(a). The optically decrypted output o^ (through the spatially incoherent D2NN) and its intensity-based representation or^ are shown in the same Fig. 3(a), together with the resulting error maps, i.e., |o^o|2 and |or^or|2, which reveal a very small degree of error. This agreement of the recovered and the ground-truth messages in both the intensity and complex-valued domains confirms the accuracy of the diffractive decryption process through an optimized spatially incoherent D2NN. Figure 3(b) shows the successful performance of the sister scheme [Fig. 1(e)], which involves diffractive encryption through a spatially incoherent D2NN and digital decryption, also revealing a negligible amount of error in both |A1o^i|2 and |or^or|2. As reported in Fig. S1 in the Supplementary Material, we also conducted a numerical experiment using the letters “L” and “A,” encoded using the real and imaginary parts of the message. The visualizations are arranged the same way as in Fig. 3, where for both schemes depicted in Figs. 1(d) and 1(e), the degree of error between the recovered and the original messages is negligible, affirming the success of using the real and imaginary part-based encoding method. For the assessment of the approximation errors when the number of diffractive features is smaller, we compared the decryption performance of three models with different numbers of diffractive features/neurons, i.e., N=(0.5,0.7,2)×2E2NiNo, for the same setup outlined in Fig. S1(a) in the Supplementary Material. The results are summarized in Fig. S2 in the Supplementary Material: for models with N<2E2NiNo, the decryption quality is compromised, exhibiting a pixel absolute error of >0.1. However, this error reduces to <0.05 for N=4E2NiNo where the decrypted images display significantly enhanced contrast and reduced noise levels.

To further evaluate the efficacy of our encryption method, we analyzed the complex image entropy, examining both the real and imaginary components separately (refer to the Sec. 4 for details). The original image i, D2NN encrypted output o^, and the digitally encrypted output Ai, along with the corresponding image entropies, are shown in Fig. S3(a) in the Supplementary Material for two complex image examples. We repeated this analysis for a set of 1000 complex images with the resulting entropy distributions reported in Fig. S3(b) in the Supplementary Material. These results demonstrate that the entropy of the encrypted images is statistically higher than that of the original images. This increase in entropy signifies a heightened level of randomness within the encrypted images, thereby validating the effectiveness of our encryption process. In addition, the entropy distributions of the D2NN encrypted images show excellent agreement with the digitally encrypted corresponding images, further demonstrating the success of our spatially incoherent optical encryption scheme.

2.3.

Different Mosaicking and Demosaicking Schemes in a Spatially Incoherent D2NN

How we assign each element in the vector ir and or to the pixels at the input and output FOVs of the diffractive network does not affect the final accuracy of the image/message reconstruction. For example, we can arrange the FOVs in such a manner that the components ir,k corresponding to a basis ek are assigned to the neighboring pixels, in two adjacent rows, as shown in Fig. S4(a) in the Supplementary Material; in an alternative implementation, the assignment/mapping can be completely arbitrary, which is equivalent to applying a random permutation operation on the input and output vectors (see Sec. 4). When compared to each other, these two approaches of mosaicking and demosaicking schemes show negligible differences in the error of the final reconstruction of the letters “U,” “C,” “L,” and “A” as shown in Fig. S4(b) in the Supplementary Material. These results underscore that the specific arrangement of the mosaicking/demosaicking schemes at the input and output FOVs does not impact the performance of the incoherent D2NN system.

3.

Discussion and Conclusion

In this article, we employed a data-free PSF-based D2NN optimization method (see Sec. 4),26 since we can determine the nonnegative intensity transformation Ar from the target complex-valued transformation A based on the mosaicking and demosaicking schemes; the columns of Ar represent the desired spatially varying PSFs of the D2NN. The advantage of this data-free learning-based D2NN optimization approach is that computationally demanding simulation of wave propagation with large Nφ is not required during the training. Coherent propagation is appropriate for simulating the spatially varying PSFs, point by point, since a point emitter at the input aperture coherently interferes with itself during optical diffraction within a D2NN; this approach makes the training time much shorter. On the other hand, this approach necessitates prior knowledge of Ar, which might not always be available, e.g., for tasks such as data classification. An alternative to this data-free PSF-based optimization approach is to train the diffractive network in an end-to-end manner, using a data-driven direct training approach.26 This strategy advances by minimizing the differences between the outputs and the targets on a large number of randomly generated examples, thereby learning the spatially varying PSFs implicitly from numerous input-target intensity patterns corresponding to the desired task – instead of learning from an explicitly predetermined Ar. This direct approach, however, requires a longer training time, necessitating the simulation of incoherent propagation for each training sample on a large data set.

In our presented approach, the choice of E is not restricted to E=3, as we have used throughout the main text. As another example of encoding, we show the image encryption results with E=4 in Fig. S5 in the Supplementary Material, where the four bases are exp(jπ2k)(k=0,1,2,3). The reconstructed “U,” “C,” “L,” and “A” letters are also reported in the same figure, confirming that given sufficient degrees of freedom (with N2E2NiNo), the linear transformation performances are similar to each other. However, compared to E=3, this choice of E=4 necessitates 4/3 times more pixels on both the diffractive network input and output FOVs—reducing the throughput (or spatial density) of complex-valued linear transformations that can be performed using a spatially incoherent D2NN. Accordingly, more diffractive features and a larger number of independent degrees of freedom (by 16/9-fold) are required within the D2NN volume to achieve an output performance level that is comparable to a design with E=3. Note that while E3 is sufficient to reconstruct the original complex-valued images regardless of the image complexity, the redundancy provided by larger E values might offer increased resilience against noise at the cost of reducing the image-processing throughput (per input aperture area) with larger E.

Our framework offers several flexibilities in implementation, which could be useful for different applications. First, the flexibility to arbitrarily permute the input and the output pixels following different mosaicking and demosaicking schemes (as introduced earlier in Sec. 2) could enhance the security of optical information transmission. A user would not be able to either spam or hack valuable information that is transferred optically without specific knowledge of the mosaicking and demosaicking schemes, thus ensuring the security of this scheme. Note that this enhancement in security is achieved without adding complexity to the system by just permuting the assignment of data elements to the pixels of the input and output devices, e.g., spatial light modulators (SLMs) and complementary metal-oxide-semiconductor (CMOS) detector-arrays. Second, the flexibility in choosing E, as discussed above, could be useful in adding an extra layer of security against unauthorized access, albeit with a trade-off in system throughput that comes with larger E. Furthermore, we can use different sets of bases for mosaicking and demosaicking operations by applying offset phase angles θi and θo, respectively, to the original bases ek=exp(jk2πE), k=0,,E1. This will result in a set of modified/encrypted bases: ek,i=exp[j(k2πE+θi)] for mosaicking and ek,o=exp[j(k2πE+θo)] for demosaicking. This powerful flexibility in representation further enhances the security of the system.

Regarding image encryption-related applications, we demonstrated two approaches [Figs. 1(d) and 1(e)] to utilize D2NNs for encryption or decryption. However, it is also possible to deploy a pair of diffractive systems in tandem, with one undertaking the matrix operation A for encryption and the other undertaking the inverse operation A1 for decryption. Furthermore, potential extensions of our work could explore a harmonized integration of polarization state controls32 and wavelength multiplexing33 to build a multifaceted, fortified encryption platform. In addition to increasing the data throughput, these additional degrees of freedom enabled by different illumination wavelengths and polarization states would further enhance the security of a diffractive processor-based system.

In this work, we focused on the numerical analysis of the presented concept. However, we should note that various D2NNs designed using deep-learning-based approaches have been experimentally validated over different parts of the electromagnetic spectrum, e.g., from terahertz (THz)9,14 to near-infrared (NIR)15 and visible wavelengths,24 showing a good agreement between the numerical and experimental results. To address some of the experimental challenges associated with fabrication errors and mechanical misalignments, a “vaccination” strategy34,35 has been devised. This approach enhances the robustness of the diffractive optical designs by incorporating such aberrations/imperfections as random variables during the training phase, thereby preparing the system to better withstand and adapt to the uncertainties inherent in real-world experimental conditions.

Although spatially coherent light is more suitable for complex-valued information processing in laboratory settings, the use of spatially incoherent light offers various practical advantages. For example, speckle noise, which is inevitable in coherent systems, can be suppressed by using partially or fully incoherent illumination. An additional benefit of spatially incoherent designs is the range of viable illumination sources that can be used: instead of using specialized coherent sources, a spatially incoherent system can work with standard light-emitting diodes (LEDs), or even under natural light, which is important for some applications of diffractive information processing.

To conclude, we demonstrated the capability of spatially incoherent diffractive networks to perform arbitrary complex-valued linear transformations. By incorporating various forms of mosaicking and demosaicking operations, we paved the way for a wider array of applications by leveraging incoherent D2NNs for complex-valued data processing. We also showcased potential applications of these spatially incoherent D2NNs for complex number-based image encryption or decryption, highlighting the security benefits arising from the system’s flexibility. Our exploration marks a significant stride toward enhanced versatility and robustness in optical information processing with spatially incoherent diffractive systems that can work under natural light.

4.

Appendix: Methods

4.1.

Linear Transformation Matrix

In this paper, we use Ni=No=16 so that AC16×16; see Fig. 1(b). To generate A, we randomly sample the amplitude of each element from the uniform distribution Uniform(0,1) and the phases from Uniform(0,2π). For the encryption application, to ensure that the result of inversion is not sensitive to small errors, we performed QR-factorization on A to obtain a condition number of one.36

4.2.

Real-Valued Nonnegative Representation of Complex Numbers

Following Eq. (4), the complex-valued input and target vectors iCNi and oCNo are represented by the corresponding real and nonnegative intensity vectors ir=[i0TiE1T]TR+ENi and or=[o0ToE1T]TR+ENo, where i=k=0E1ekik and o=k=0E1ekok. The desired all-optical intensity transformation Ar between ir and or is derived from the target complex-valued linear transformation A following Eqs. (1) and (5). We should note that deriving Ar from A requires mapping each complex element a to its real and nonnegative representation (a0,,aE1) based on the E3 complex bases ek such that a=k=0E1ekak. To define a unique mapping, we follow an algorithm29 by imposing additional constraints: ak=0 if 2πEphase(eka*)2π2πE, i.e., ak=0 if the angle between a and ek is greater than 2πE; here a* represents the complex conjugate of a. The same constraints were also used while mapping the complex input vectors i to the real and nonnegative intensity vectors ir.

4.3.

Mosaicking and Demosaicking Schemes

For mosaicking (demosaicking) assignment of each element of ir (or) to one of the Ni,r=ENi (No,r=ENo) pixels of the 2D input (output), the arrangement of the FOV can be regular, e.g., in a row-major order as shown in Fig. S4(a) in the Supplementary Material, “Regular mosaicking.” Alternatively, the pixel assignment on the input (output) FOV can follow any arbitrary mapping which can be defined by a permutation matrix Pi (Po) operating on the input (output) vector; see Fig. S4(a) in the Supplementary Material, “Arbitrary mosaicking.” For such cases, when ordered in a row-major format, intensities on the input (output) FOVs ir (or) can be written as ir=Pi[i0TiE1T]T (or=Po[o0ToE1T]T). Accordingly, such an arbitrary arrangement of pixels was accounted for by redefining the all-optical intensity transformation as PoArPiT.

4.4.

Spatially Incoherent Light Propagation through a D2NN

The 1D vector ir is rearranged as a 2D distribution of intensity I(x,y) at the input FOV of the D2NN. To numerically model the spatially incoherent propagation of the input intensity distribution I(x,y) through the D2NN, we coherently propagated the optical field Iexp(jφ) through the trainable diffractive surfaces to the output plane, where φ is a random 2D phase distribution, i.e., φ(x,y)Uniform(0,2π) for all (x,y). If we denote the coherent field propagation operator as D{·} (see Sec. 4.5), then the instantaneous output intensity is |D{I(x,y)exp[jφ(x,y)]}|2 and the time-averaged output intensity O(x,y) for spatially incoherent light can be written as

Eq. (7)

O(x,y)=|D{I(x,y)exp[jφ(x,y)]|2.

The average output intensity can be approximately calculated by repeating the coherent wave propagation D{·} Nφ-times, each time with a different random phase distribution φr(x,y), and averaging the resulting Nφ output intensities,

Eq. (8)

O(x,y)=limNφ1Nφr=1Nφ|D{I(x,y)exp[jφr(x,y)]}|2.

We used Nφ=20,000 for estimating the incoherent output intensity O(x,y) corresponding to any arbitrary input intensity I(x,y). Note that when only one pixel at the input aperture is activated, with all other input pixels being inactive with zero intensity, as is the case while evaluating spatially varying PSFs, the application of Eq. (8) becomes redundant, although one could still use it. In this scenario, all the light diffracted from a single point source is mutually coherent. Consequently, for the purposes of evaluating the spatially varying PSFs of the system, as elaborated later in Sec. 4.7, employing a coherent propagation model for each point emitter at the input aperture is accurate and provides a faster solution.

4.5.

Coherent Propagation of Optical Fields: D{·}

The propagation of spatially coherent light patterns through a diffractive processor, denoted by D{·}, involves a series of interactions with consecutive diffractive surfaces, interleaved by wave propagation through the free space separating these surfaces. We assume that these modulations are introduced by phase-only diffractive surfaces, i.e., the field amplitude remains unchanged during the light–matter interaction. Specifically, we assume that a diffractive surface alters the incident optical field, symbolized as u(x,y), in a localized manner according to the optimized phase values ϕM(x,y) of the diffractive features, resulting in the phase-modulated field u(x,y)exp[jφM(x,y)]. The diffractive surfaces are coupled by free-space propagation, allowing the light to travel from one surface to the next. We used the angular spectrum method to simulate the free-space propagation,37

Eq. (9)

u(x,y;z=z0+d)=F1{F{u(x,y;z=z0)}×H(fx,fy;d)},
where F{·} is the 2D Fourier transform and F1{·} is its inverse operation. H(fx,fy;d) is the free-space transfer function corresponding to propagation distance d. For wavelength λ,

Eq. (10)

H(fx,fy;d)={exp[j2πλd1(λfx)2(λfy)2],fx2+fy2<1/λ20,otherwise.

The fields were discretized with a lateral sampling interval of δ0.53λ to accommodate all the propagating modes and sufficiently zero-padded to remove aliasing artifacts.38

4.6.

Diffractive Network Architecture

We modeled the diffractive surfaces by their laterally discretized heights h, which correspond to phase delays φM=2πλ(n1)h, where n is the refractive index of the material. The connectivity between consecutive diffractive layers9 was kept equal across the diffractive designs with varying N by setting the separation between the layers as d=Wδλ, where the width of each diffractive layer is W=NKδ. Here, K is the number of diffractive layers; we used K=4 throughout the paper.

4.7.

Training and Evaluation of Spatially Incoherent Diffractive Processors

For performing an arbitrary complex-valued linear transformation with a diffractive processor, we used the PSF-based data-free design approach, where the diffractive features were optimized so that the all-optical intensity transformation of the diffractive processor achieves Ar^Ar. To evaluate Ar^, we used ENi intensity vectors {ir,t}t=1ENi, where ir,t[l]=1 if l=t and 0 otherwise. In other words, {ir,t}t=1ENi are unit impulses, located at different input pixels. We simulated the all-optical output intensity vectors {or,t}t=1ENi corresponding to these unit impulses and stacked them, i.e.,

Eq. (11)

Ar^=[or,1|or,2||or,ENi].

Finally, we compensated for the optical diffraction efficiency-related scale mismatch through multiplication by a scalar, i.e.,

Eq. (12)

Ar^=σAr^,
where σ was defined as

Eq. (13)

σ=n=1ENim=1ENoAr[m,n]Ar^[m,n]n=1ENim=1ENo(Ar^[m,n])2.

The MSE loss function to be minimized was defined as

Eq. (14)

LPSF=1NiNon=1ENim=1ENo(Ar[m,n]Ar^[m,n])2.

The height h of the diffractive features at each layer was constrained between zero and a maximum value hmax by employing a latent variable hlatent. The relationship between the constrained height h and the latent variable hmax was defined as h=hmax2×[sin(hlatent)+1], where we chose hmaxλn1, which corresponds to a differential phase modulation of 2π. The latent variables were initialized randomly from the standard normal distribution N(0,1).

The optimization of the diffractive layers was carried out using the AdamW optimizer39 for 12,000 iterations, with an initial learning rate of 103. The model state corresponding to the minimum of the MSEs evaluated after every 400 iterations was selected for the final evaluation. The D2NN models were implemented and trained using PyTorch (v1.12.1)40 with Compute Unified Device Architecture (CUDA) version 12.2. Training and testing were done on GeForce RTX 3090 graphics processing units (GPUs) in workstations with 256 GB of random-access memory (RAM) and Intel Core i9 central processing unit (CPU). The training time of the models varied with the size of the models. For example, the model used in Figs. 2(b) and 2(c) took around 1 h for 12,000 iterations. Inference for each input vector with Nφ=20,000 takes around 30 s.

To visualize the all-optical transformation error in Fig. 2, we used the error matrix εr=(ArAr^)2; here (·)2 denotes an element-wise operation. To evaluate the error ε at complex linear transformation, we applied demosaicking to the columns of Ar^ to form the block matrix [A0^||AE1^]CNo×ENi. Here, the subscript k represents that Ak^ is measured by applying the columns of ekI as input and stacking the corresponding demosaicked (complex-valued) output vectors. Accordingly, we have

Eq. (15)

ε=|[e0AA0^||eE1AAE1^]|2.

Here, |·|2 represents an element-wise operation.

4.8.

Entropy Evaluation

For the evaluation of the image encryption strength, we computed the entropy separately for the real and imaginary parts of a complex image as follows:

Eq. (16)

HRe/Im(x)=ipiRe/Im·log(piRe/Im),
where we calculated the distribution of either the real or imaginary part (denoted by the superscript) over the pixels of x; here x denotes the complex image. pi is the probability (normalized histogram count) for a certain pixel value i.

For the histograms presented in Fig. S3(b) in the Supplementary Material, the data set is adapted from the Extended MNIST (EMNIST).41 For the creation of the input complex images, we randomly selected two distinct images from the EMNIST data set, using one as the real part and the other as the imaginary part of the complex image. To ensure compatibility with the input dimensionality, these images were bilinearly downsampled to a resolution of 4×4  pixels. We randomly formed a set of 1000 such complex images to compile the histograms presented in Fig. S3(b) in the Supplementary Material.

Code and Data Availability

The data and methods required to assess the conclusions drawn in this study are included within the main text and supplementary information files. The optimization of machine-learning models used in this research was conducted using the publicly available PyTorch library. Additional data can be requested from the corresponding author.

Acknowledgments

The Ozcan Research Group at UCLA acknowledges the support of the U.S. Department of Energy (DOE), Office of Basic Energy Sciences, Division of Materials Sciences and Engineering under Award # DE-SC0023088.

References

1. 

J. W. Goodman, A. R. Dias and L. M. Woody, “Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms,” Opt. Lett., 2 (1), 1 –3 https://doi.org/10.1364/OL.2.000001 OPLEDP 0146-9592 (1978). Google Scholar

2. 

H. Kwon et al., “Nonlocal metasurfaces for optical signal processing,” Phys. Rev. Lett., 121 (17), 173004 https://doi.org/10.1103/PhysRevLett.121.173004 PRLTAO 0031-9007 (2018). Google Scholar

3. 

R. Hamerly et al., “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X, 9 (2), 021032 https://doi.org/10.1103/PhysRevX.9.021032 PRXHAE 2160-3308 (2019). Google Scholar

4. 

A. Silva et al., “Performing mathematical operations with metamaterials,” Science, 343 (6167), 160 –163 https://doi.org/10.1126/science.1242818 SCIEAS 0036-8075 (2014). Google Scholar

5. 

B. J. Shastri et al., “Photonics for artificial intelligence and neuromorphic computing,” Nat. Photonics, 15 (2), 102 –114 https://doi.org/10.1038/s41566-020-00754-y NPAHBY 1749-4885 (2021). Google Scholar

6. 

G. Wetzstein et al., “Inference in artificial intelligence with deep optics and photonics,” Nature, 588 (7836), 39 –47 https://doi.org/10.1038/s41586-020-2973-6 (2020). Google Scholar

7. 

X. Xu et al., “11 TOPS photonic convolutional accelerator for optical neural networks,” Nature, 589 (7840), 44 –51 https://doi.org/10.1038/s41586-020-03063-0 (2021). Google Scholar

8. 

S. M. Kamali et al., “Angle-multiplexed metasurfaces: encoding independent wavefronts in a single metasurface under different illumination angles,” Phys. Rev. X, 7 (4), 041056 https://doi.org/10.1103/PhysRevX.7.041056 PRXHAE 2160-3308 (2017). Google Scholar

9. 

X. Lin et al., “All-optical machine learning using diffractive deep neural networks,” Science, 361 (6406), 1004 –1008 https://doi.org/10.1126/science.aat8084 SCIEAS 0036-8075 (2018). Google Scholar

10. 

O. Kulce et al., “All-optical synthesis of an arbitrary linear transformation using diffractive surfaces,” Light Sci. Appl., 10 (1), 196 https://doi.org/10.1038/s41377-021-00623-5 (2021). Google Scholar

11. 

M. S. S. Rahman et al., “Ensemble learning of diffractive optical networks,” Light Sci. Appl., 10 (1), 14 https://doi.org/10.1038/s41377-020-00446-w (2021). Google Scholar

12. 

J. Li et al., “Class-specific differential detection in diffractive optical neural networks improves inference accuracy,” Adv. Photonics, 1 (4), 046001 https://doi.org/10.1117/1.AP.1.4.046001 AOPAC7 1943-8206 (2019). Google Scholar

13. 

M. S. S. Rahman and A. Ozcan, “Time-lapse image classification using a diffractive neural network,” (2022). Google Scholar

14. 

B. Bai et al., “To image, or not to image: class-specific diffractive cameras with all-optical erasure of undesired objects,” eLight, 2 (1), 14 https://doi.org/10.1186/s43593-022-00021-3 (2022). Google Scholar

15. 

B. Bai et al., “Data-class-specific all-optical transformations and encryption,” Adv. Mater., 35 (31), 2212091 https://doi.org/10.1002/adma.202212091 ADVMEW 0935-9648 (2023). Google Scholar

16. 

Y. Gao et al., “Multiple-image encryption and hiding with an optical diffractive neural network,” Opt. Commun., 463 125476 https://doi.org/10.1016/j.optcom.2020.125476 OPCOB8 0030-4018 (2020). Google Scholar

17. 

Y. Su et al., “Optical image conversion and encryption based on structured light illumination and a diffractive neural network,” Appl. Opt., 62 (23), 6131 –6139 https://doi.org/10.1364/AO.495542 APOPAI 0003-6935 (2023). Google Scholar

18. 

D. Mengu and A. Ozcan, “All-optical phase recovery: diffractive computing for quantitative phase imaging,” Adv. Opt. Mater., 10 (15), 2200281 https://doi.org/10.1002/adom.202200281 2195-1071 (2022). Google Scholar

19. 

C.-Y. Shen et al., “Multispectral quantitative phase imaging using a diffractive optical network,” Adv. Intell. Syst., 5 (11), 2300300 https://doi.org/10.1002/aisy.202300300 (2023). Google Scholar

20. 

T. Yan et al., “Fourier-space diffractive deep neural network,” Phys. Rev. Lett., 123 (2), 023901 https://doi.org/10.1103/PhysRevLett.123.023901 PRLTAO 0031-9007 (2019). Google Scholar

21. 

B. Bai et al., “Pyramid diffractive optical networks for unidirectional magnification and demagnification,” (2023). Google Scholar

22. 

E. Goi, S. Schoenhardt and M. Gu, “Direct retrieval of Zernike-based pupil functions using integrated diffractive deep neural networks,” Nat. Commun., 13 (1), 7531 https://doi.org/10.1038/s41467-022-35349-4 NCAOBW 2041-1723 (2022). Google Scholar

23. 

Z. Huang et al., “All-optical signal processing of vortex beams with diffractive deep neural networks,” Phys. Rev. Appl., 15 (1), 014037 https://doi.org/10.1103/PhysRevApplied.15.014037 PRAHB2 2331-7019 (2021). Google Scholar

24. 

X. Luo et al., “Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible,” Light Sci. Appl., 11 (1), 158 https://doi.org/10.1038/s41377-022-00844-2 (2022). Google Scholar

25. 

O. Kulce et al., “All-optical information-processing capacity of diffractive surfaces,” Light Sci. Appl., 10 (1), 25 https://doi.org/10.1038/s41377-020-00439-9 (2021). Google Scholar

26. 

M. S. S. Rahman et al., “Universal linear intensity transformations using spatially incoherent diffractive processors,” Light Sci. Appl., 12 (1), 195 https://doi.org/10.1038/s41377-023-01234-y (2023). Google Scholar

27. 

B. H. Soffer et al., “Programmable real-time incoherent matrix multiplier for optical processing,” Appl. Opt., 25 (14), 2295 –2305 https://doi.org/10.1364/AO.25.002295 APOPAI 0003-6935 (1986). Google Scholar

28. 

W. Swindell, “A noncoherent optical analog image processor,” Appl. Opt., 9 (11), 2459 –2469 https://doi.org/10.1364/AO.9.002459 APOPAI 0003-6935 (1970). Google Scholar

29. 

J. W. Goodman and L. M. Woody, “Method for performing complex-valued linear operations on complex-valued data using incoherent light,” Appl. Opt., 16 (10), 2611 –2612 https://doi.org/10.1364/AO.16.002611 APOPAI 0003-6935 (1977). Google Scholar

30. 

W. Schneider and W. Fink, “Incoherent optical matrix multiplication,” Opt. Acta Int. J. Opt., 22 (11), 879 –889 https://doi.org/10.1080/713818991 (1975). Google Scholar

31. 

A. R. Dias, “Incoherent optical matrix-matrix multiplier,” (1981). Google Scholar

32. 

Y. Li et al., “Universal polarization transformations: spatial programming of polarization scattering matrices using a deep learning-designed diffractive polarization transformer,” Adv. Mater., 35 (51), 2303395 https://doi.org/10.1002/adma.202303395 ADVMEW 0935-9648 (2023). Google Scholar

33. 

J. Li et al., “Massively parallel universal linear transformations using a wavelength-multiplexed diffractive optical network,” Adv. Photonics, 5 (1), 016003 https://doi.org/10.1117/1.AP.5.1.016003 AOPAC7 1943-8206 (2023). Google Scholar

34. 

D. Mengu et al., “Misalignment resilient diffractive optical networks,” Nanophotonics, 9 (13), 4207 –4219 https://doi.org/10.1515/nanoph-2020-0291 (2020). Google Scholar

35. 

D. Mengu, Y. Rivenson and A. Ozcan, “Scale-, shift-, and rotation-invariant diffractive optical networks,” ACS Photonics, 8 (1), 324 –334 https://doi.org/10.1021/acsphotonics.0c01583 (2021). Google Scholar

36. 

R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press( (2012). Google Scholar

37. 

J. W. Goodman, Introduction to Fourier Optics, W. H. Freeman( (2005). Google Scholar

38. 

T. Kozacki and K. Falaggis, “Angular spectrum-based wave-propagation method with compact space bandwidth for large propagation distances,” Opt. Lett., 40 (14), 3420 –3423 https://doi.org/10.1364/OL.40.003420 OPLEDP 0146-9592 (2015). Google Scholar

39. 

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” (2019). Google Scholar

40. 

A. Paszke et al., “PyTorch: an imperative style, high-performance deep learning library,” in Adv. Neural Inf. Process. Syst., (2019). Google Scholar

41. 

G. Cohen et al., “EMNIST: extending MNIST to handwritten letters,” in Int. Joint Conf. Neural Netw. (IJCNN), 2921 –2926 (2017). https://doi.org/10.1109/IJCNN.2017.7966217 Google Scholar

Biographies of the authors are not available.

CC BY: © The Authors. Published by SPIE and CLP under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Xilin Yang, Md Sadman Sakib Rahman, Bijie Bai, Jingxi Li, and Aydogan Ozcan "Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks," Advanced Photonics Nexus 3(1), 016010 (19 January 2024). https://doi.org/10.1117/1.APN.3.1.016010
Received: 19 October 2023; Accepted: 3 January 2024; Published: 19 January 2024
Advertisement
Advertisement
KEYWORDS
Image encryption

Education and training

Data processing

Image processing

Light sources and illumination

Point spread functions

Geometrical optics

Back to Top