Large-scale linear operations are the cornerstone for performing complex computational tasks. Using optical computing to perform linear transformations offers potential advantages in terms of speed, parallelism, and scalability. Previously, the design of successive spatially engineered diffractive surfaces forming an optical network was demonstrated to perform statistical inference and compute an arbitrary complex-valued linear transformation using narrowband illumination. We report deep-learning-based design of a massively parallel broadband diffractive neural network for all-optically performing a large group of arbitrarily selected, complex-valued linear transformations between an input and output field of view, each with N_{i} and N_{o} pixels, respectively. This broadband diffractive processor is composed of N_{w} wavelength channels, each of which is uniquely assigned to a distinct target transformation; a large set of arbitrarily selected linear transformations can be individually performed through the same diffractive network at different illumination wavelengths, either simultaneously or sequentially (wavelength scanning). We demonstrate that such a broadband diffractive network, regardless of its material dispersion, can successfully approximate N_{w} unique complex-valued linear transforms with a negligible error when the number of diffractive neurons (N) in its design is ≥2N_{w}N_{i}N_{o}. We further report that the spectral multiplexing capability can be increased by increasing N; our numerical analyses confirm these conclusions for N_{w} > 180 and indicate that it can further increase to N_{w} ∼ 2000, depending on the upper bound of the approximation error. Massively parallel, wavelength-multiplexed diffractive networks will be useful for designing high-throughput intelligent machine-vision systems and hyperspectral processors that can perform statistical inference and analyze objects/scenes with unique spectral properties. |
1.IntroductionComputing plays an increasingly vital role in constructing intelligent, digital societies. The exponentially growing power consumption of digital computers brings some important challenges for large-scale computing. Optical computing can potentially provide advantages in terms of power efficiency, processing speed, and parallelism. Motivated by these, we have witnessed various research and development efforts on advancing optical computing over the last few decades.^{1}^{–}^{32} Synergies between optics and machine learning have enabled the design of novel optical components using deep-learning-based optimization,^{33}^{–}^{44} while also allowing the development of advanced optical/photonic information processing platforms for artificial intelligence.^{5}^{,}^{20}^{–}^{32}^{,}^{45} Among different optical computing designs, diffractive optical neural networks represent a free-space-based framework that can be used to perform computation, statistical inference, and inverse design of optical elements.^{22} A diffractive neural network is composed of multiple transmissive and/or reflective diffractive layers (or surfaces), which leverage light–matter interactions to jointly perform modulation of the input light field to generate the desired output field. These passive diffractive layers, each containing thousands of spatially engineered diffractive features (termed as “diffractive neurons”), are designed (optimized) in a computer using deep-learning tools, e.g., stochastic gradient descent and error backpropagation. Once the training process converges, the resulting diffractive layers are fabricated to form a passive, free-space optical processing unit that does not consume any power except the illumination light. This framework is also scalable, since it can adapt to changes in the input field of view (FOV) or data dimensions by adjusting the size and/or the number of diffractive layers. Diffractive networks can directly access the 2D/3D input information of a scene or object and process the optical information encoded in the amplitude, phase, spectrum, and polarization of the input light, making them highly suitable as intelligent optical front ends for machine-vision systems. Diffractive neural networks have been used to perform various optical information processing tasks, including, object classification,^{22}^{,}^{46}^{–}^{57} image reconstruction,^{52}^{,}^{58}^{,}^{59} all-optical phase recovery and quantitative image phase imaging,^{60} class-specific imaging,^{61} super-resolution image displays,^{62} and logical operations.^{63}^{–}^{65} Employing successive spatially engineered diffractive surfaces as the backbone for inverse design of deterministic optical elements also enabled numerous applications, such as spatially controlled wavelength demultiplexing,^{66} pulse engineering,^{67} and orbital angular momentum multiplexing/demultiplexing.^{68} In addition to these task-specific applications, diffractive networks also serve as general-purpose computing modules that can be used to create compact, power-efficient all-optical processors. Recent efforts have shown that a diffractive network can be used to all-optically perform an arbitrarily selected, complex-valued linear transformation between its input and output FOVs with a negligible error when the number of trainable diffractive neurons ($N$) approaches ${N}_{i}{N}_{o}$, where ${N}_{i}$ and ${N}_{o}$ represent the number of pixels at the input and output FOVs, respectively.^{69} Using nontrainable, predetermined polarizer arrays within an isotropic diffractive network, a polarization-encoded diffractive processor was also demonstrated to accurately perform a group of ${N}_{p}=4$ distinct complex-valued linear transformations using a single system with $N\ge {N}_{p}{N}_{i}{N}_{o}=4{N}_{i}{N}_{o}$; in this case, each one of these four optical transformations can be accessed through a different combination of the input/output polarization states.^{70} This polarization-encoded diffractive system is limited to a multiplexing factor of ${N}_{p}=4$, since an additional desired transformation matrix that can be assigned to a new combination of input–output polarization states can be written as a linear combination of the four linear transforms that are already learned by the diffractive processor.^{70} These former works involved monochromatic diffractive networks where a single illumination wavelength encoded the input information channels. In this paper, we rigorously address and analyze the following question. Let us imagine an optical black-box (composed of diffractive surfaces and/or reconfigurable spatial light modulators): how can that black-box be designed to simultaneously implement, e.g., ${N}_{w}>1000$ independent linear transformations corresponding to $>1000$ different matrix multiplications (with $>1000$ different independent matrices) at ${N}_{w}>1000$ different unique wavelengths? More specifically, here we report the use of a wavelength multiplexing scheme to create a broadband diffractive optical processor, which massively increases the throughput of all-optical computing by performing a group of distinct linear transformations in parallel using a single diffractive network. By encoding the input/output information of the target linear transforms using ${N}_{w}$ different wavelengths (i.e., ${\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{{N}_{w}}$), we created a single-broadband diffractive network to simultaneously perform a group of ${N}_{w}$ arbitrarily selected, complex-valued linear transforms with negligible error. We demonstrate that $N\ge 2{N}_{w}{N}_{i}{N}_{o}$ diffractive neurons are required to successfully implement ${N}_{w}$ complex-valued linear transforms using a broadband diffractive processor, where the thickness values of its diffractive neurons constitute the only variables optimized during the deep-learning-based training process. Without loss of generality, we numerically demonstrate wavelength-multiplexed universal linear transformations with ${N}_{w}>180$, which can be further increased to ${N}_{w}\sim 2000$ based on the approximation error threshold that is acceptable. We also demonstrate that these wavelength-multiplexed universal linear transformations can be implemented even with a flat material dispersion, where the refractive index ($n$) of the material at the selected wavelength channels is the same, i.e., $n(\lambda )\approx {n}_{o}$ for $\lambda \in \{{\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{{N}_{w}}\}$. The training process of these wavelength-multiplexed diffractive networks was adaptively balanced across different wavelengths of operation such that the all-optical linear transformation accuracies of the different channels were similar to each other, without introducing a bias toward any wavelength channel or the corresponding linear transform. It is important to emphasize that the goal of this work is not to train the broadband diffractive network to implement the correct linear transformations for only a few input–output field pairs. We are not aiming to use the diffractive layers as a form of metamaterial that can output different images or optical fields at different wavelengths. Instead, our goal is to generalize the performance of our broadband diffractive processor to infinitely many pairs of input and output complex fields that satisfy the target linear transformation at each spectral channel, thus achieving universal all-optical computing of multiple complex-valued matrix–vector multiplications, accessed by a set of illumination wavelengths (${N}_{w}\gg 1$). Moreover, we would like to clarify that the wavelength multiplexing scheme used for our framework in this paper should not be confused with other efforts that integrated wavelength-division multiplexing (WDM) technologies to optical neural computing, such as in Refs. 7172.–73. In these earlier work, WDM was utilized to encode the 1D input/output information to perform a vector–matrix multiplication operation, where the optical network was designed to perform only one linear transformation based on a single input data vector, producing a single output vector that is spectrally encoded. However, in our work, we use the wavelength multiplexing to perform multiple independent linear transformations (${N}_{w}\gg 1$) within a single optical network architecture, where each of these complex-valued linear transformations can be accessed at distinct wavelengths (simultaneously or sequentially). Also the input and output fields of each one of these linear transformations in our framework are spatially encoded in 2D at the input/output FOVs using the same wavelength, rather than being spectrally encoded, as demonstrated in earlier WDM-based designs.^{71}^{–}^{73} This unique feature allows our diffractive network to all-optically perform a large group of independent linear transformations in parallel by sharing the same 2D input/output FOVs. Compared to the previous literature, this paper has various unique aspects: (1) this is the first demonstration of a spatially engineered diffractive system to achieve spectrally multiplexed universal linear transformations; (2) the level of massive multiplexing that is reported through a single wavelength-multiplexed diffractive network (e.g., ${N}_{w}>180$) is significantly larger compared to other channels of multiplexing, including polarization diversity,^{70} and this number can be further increased to ${N}_{w}\approx 2000$ with more diffractive neurons ($N$) used in the network design; (3) deep-learning-based training of the diffractive layers used adaptive spectral weights to equalize the performances of all the linear transformations assigned to ${N}_{w}$ different wavelengths; (4) the capability to perform multiple linear transformations using wavelength multiplexing does not require any wavelength-sensitive optical elements to be added into the diffractive network design, except for wavelength scanning or broadband illumination with demultiplexing filters; and (5) this wavelength-multiplexed diffractive processor can be implemented using various materials with different dispersion properties (including materials with a flat dispersion curve) and is widely applicable to different parts of the electromagnetic spectrum, including the visible band. Furthermore, we would like to emphasize that since each dielectric feature of this wavelength-multiplexed diffractive processor is based on material thickness variations, it simultaneously modulates all the wavelengths within the spectrum of interest. This means that each wavelength channel within the set of ${N}_{w}$ unique wavelengths has a different error gradient with respect to the optical transformation that is assigned to it, and therefore, the diffractive layer optimization spanning ${N}_{w}$ wavelengths deviates from the ideal optimization path of an individual wavelength. Since the diffractive layers considered here do not possess any spectral selectivity, we used a training loss function, simultaneously taking into account all the wavelength channels that were used to find a locally optimal intersection set among all the ${N}_{w}$ wavelengths to accurately perform all the desired ${N}_{w}$ transformations. This behavior is quite different from the earlier generations of monochromatic diffractive processors^{69} that optimized the phase profiles of the diffractive layers for only one wavelength assigned to one optical transformation. Based on the massive parallelism exhibited by this broadband diffractive network, we believe that this platform and the underlying concepts can be used to develop optical processors operating at different parts of the spectrum with extremely high computing throughput. Its throughput can be further increased by expanding the range and/or the number of encoding wavelengths as well as by combining wavelength multiplexing with other multiplexing schemes such as polarization encoding. The reported framework would be valuable for the development of multicolor and hyperspectral machine-vision systems that perform statistical inference based on the spatial and spectral information of an object or a scene, which may find applications in various fields, including biomedical imaging, remote sensing, analytical chemistry, and material science. 2.Results2.1.Design of Wavelength-Multiplexed Diffractive Optical Networks for Massively Parallel Universal Linear TransformationsThroughout this paper, the terms “diffractive deep neural network,” “diffractive neural network,” “diffractive optical network,” and “diffractive network” are used interchangeably. Figure 1 illustrates the schematic of our broadband diffractive optical network design for massively parallel, wavelength-multiplexed all-optical computing. The broadband diffractive network, composed of eight successive diffractive layers, contains in total $N$ diffractive neurons with their thickness values as learnable variables, which are jointly trained to perform a group of ${N}_{w}$ linear transformations between the input and output FOVs through ${N}_{w}$ parallel wavelength channels. More details about this diffractive architecture, its optical forward model, and training details can be found in Sec. 4. To start with, a group of ${N}_{w}$ different wavelengths, ${\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{{N}_{w}}$, are selected to be used as the wavelength channels for the broadband diffractive processor to encode different input complex fields and perform different target transformations (see Fig. 1). For the implementation of the broadband diffractive designs in this paper, we fixed the mean value ${\lambda}_{m}$ of this group of wavelengths $\{{\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{{N}_{w}}\}$, i.e., ${\lambda}_{m}=\frac{1}{{N}_{w}}\sum _{w=1}^{{N}_{w}}{\lambda}_{w}$ and assigned these wavelengths to be equally spaced between ${\lambda}_{1}=0.9125{\lambda}_{m}$ and ${\lambda}_{{N}_{w}}=1.0875{\lambda}_{m}$. Unless otherwise specified, we chose ${\lambda}_{m}$ to be 0.8 mm in our numerical simulations, as it aligns with the terahertz band that was experimentally used in several of our previous works.^{50}^{,}^{52}^{,}^{58}^{,}^{59}^{,}^{61}^{,}^{62}^{,}^{66}^{,}^{67} Without loss of generality, the wavelengths used for the design of the broadband diffractive processors can also be selected at other parts of the electromagnetic spectrum, such as the visible band, for which the related simulation results and analyses can be found in Sec. 3 to follow. Based on the scalar diffraction theory, the broadband optical fields propagating in the diffractive system are simulated at these selected wavelengths using a sampling period of $0.5{\lambda}_{m}$ along both the horizontal and vertical directions. We also select $0.5{\lambda}_{m}$ as the size of the individual neurons on the diffractive layers. With these selections, we include in our optical forward model all the propagating modes that are transmitted through the diffractive layers. Let $\mathit{i}$ and ${\mathit{o}}^{\prime}$ be the complex-valued, vectorized versions of the 2D input and output broadband complex fields at the input and output FOVs of the diffractive network, respectively, as shown in Fig. 1. We denote ${\mathit{i}}_{w}$ and ${\mathit{o}}_{w}^{\prime}$ as the complex fields generated by sampling the optical fields at the wavelength ${\lambda}_{w}$ $(w\in \{1,2,\dots ,{N}_{w}\})$ within the input and output FOVs, respectively, and then vectorizing the resulting 2D matrices in column-major order. According to this notation, ${\mathit{i}}_{w}$ and ${\mathit{o}}_{w}^{\prime}$ represent the input and output of the ${w}^{\text{th}}$ wavelength channel in our wavelength-multiplexed diffractive network, respectively. In the following analyses, without loss of generality, the number of pixels at the input and output FOVs is selected to be the same, i.e., ${N}_{i}={N}_{o}$. To implement ${N}_{w}$ target linear transformations, we randomly generated ${N}_{w}$ complex-valued matrices ${\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{{N}_{w}}$, each composed of ${N}_{i}\times {N}_{o}$ entries, to serve as a group of unique arbitrary linear transformations to be all-optically implemented using a wavelength-multiplexed diffractive processor. All these matrices, ${\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{{N}_{w}}$, are generated using unique random seeds to ensure that they are different; we further confirmed the differences between these randomly generated matrices by calculating the cosine similarity values between any two combinations of the matrices in a given set (see e.g., Fig. S1 in the Supplementary Material). For each unique matrix ${\mathit{A}}_{w}\in \{{\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{{N}_{w}}\}$, we randomly generated a total of 70,000 complex-valued input field vectors $\{{\mathit{i}}_{w}\}$ and created the corresponding output field vectors $\{{\mathit{o}}_{w}\}$ by calculating ${\mathit{o}}_{w}={\mathit{A}}_{w}{\mathit{i}}_{w}$. We separated these input–output complex field pairs into three individual sets for training, validation, and testing, each containing 55,000, 5000, and 10,000 samples, respectively. By increasing the size of these training data sets to $>\mathrm{100,000}$ input–output pairs of randomly generated complex fields, it is possible to further improve the transformation accuracy of the trained broadband diffractive networks; since this does not change the general conclusions of this work, it is left as future work. More details on the generation of the training and testing data can be found in Sec. 4. Based on the notations introduced above, the objective of training our wavelength-multiplexed diffractive processor is that, for any of its wavelength channels operating at ${\lambda}_{w}$ $(w\in \{1,2,\dots ,{N}_{w}\})$, the diffractive output fields $\{{\mathit{o}}_{w}^{\prime}\}$ computed from any given inputs $\{{\mathit{i}}_{w}\}$ should provide a match to the output ground-truth (target) fields $\{{\mathit{o}}_{w}\}$. If this can be achieved for any arbitrary choice of $\{{\mathit{i}}_{w}\}$, this means that the all-optical transformations ${\mathit{A}}_{w}^{\prime}$ performed by the trained broadband diffractive system at different wavelength channels constitute an accurate approximation to their ground-truth (target) transformation matrices ${\mathit{A}}_{w}$, where $w\in \{1,2,\dots ,{N}_{w}\}$. As the first step of our analysis, we selected the input/output field size to be ${N}_{i}={N}_{o}=8\times 8=64$ and started to train broadband diffractive processors with ${N}_{w}=2$, 4, 8, 16, and 32 wavelength channels. Results and analysis of implementing more wavelength channels (e.g., ${N}_{w}>100$) through a single diffractive processor will be provided in later sections. For this task, we randomly generated a set of 32 different matrices with dimensions of $64\times 64$, i.e., ${\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{32}$, with their first eight visualized (as examples) in Fig. 2(a) with their amplitude and phase components. Figure S1a in the Supplementary Material also reports the cosine similarity values between these randomly generated 32 matrices, confirming that they are all very close to 0. For each ${N}_{w}$ mentioned above, we also trained several broadband diffractive designs with different numbers of trainable diffractive neurons, i.e., $N$ ∈ {3900; 8200; 16,900; 32,800; 64,800; 131,100; 265,000}, all using the same training data sets {(${\mathit{i}}_{w}$, ${\mathit{o}}_{w}$)}, randomly generated based on the target transformations $\{{\mathit{A}}_{w}\}$ ($w\in \{1,2,\dots ,{N}_{w}\}$) and the same number of training epochs. To benchmark the performance of these wavelength-multiplexed diffractive networks for each $N$, we also trained monochromatic diffractive networks without using any wavelength multiplexing as our baseline, which can approximate only one target linear transformation using a single wavelength (i.e., ${N}_{w}=1$). Here we simply select ${\lambda}_{m}$ as the operating wavelength of this baseline monochrome diffractive network used for comparison. During the training of these diffractive networks, mean squared error (MSE) loss is calculated per wavelength channel to make the diffractive output fields come as close to the ground-truth (target) fields as possible. However, in the wavelength-multiplexed diffractive models, treating all these channels equally in the final loss function would result in the all-optical transformation accuracies being biased, since longer wavelengths present lower spatial resolution. To address this issue and equalize the all-optical transformation accuracies of all the wavelengths within the selected channel set, we devised a strategy by adaptively adjusting the weight coefficients applied to the loss terms of these channels during the training process (see Sec. 4 for details). After the deep-learning-based training of the broadband diffractive designs introduced above is completed, the resulting all-optical diffractive transformations of these models are summarized in Figs. 2(b)–2(d). We quantified the generalization performance of these broadband diffractive networks on the blind testing data set for each transformation using three different metrics: (1) the normalized transformation MSE (${\mathrm{MSE}}_{\text{Transformation}}$), (2) the cosine similarity (CosSim) between the all-optical transforms and the target transforms, and (3) the MSE between the diffractive network output fields and their ground-truth output fields (${\mathrm{MSE}}_{\mathrm{Output}}$).^{53}^{,}^{69} More details about the definitions of these performance metrics are provided in Sec. 4. For the diffractive designs with different numbers of wavelength channels (${N}_{w}=1$, 2, 4, 8, 16, and 32), we report these performance metrics in Figs. 2(b)–2(d) as a function of the number of trainable diffractive neurons ($N$). These performance metrics reported in Fig. 2 refer to the mean values calculated across all the wavelength channels, whereas the results of the individual wavelength channels are shown in Fig. 3. In Fig. 2(b), it can be seen that the transformation errors of all the trained diffractive models show a monotonic decrease as $N$ increases, which is expected due to the increased degrees of freedom in the diffractive processor. Also the approximation errors of the regular diffractive networks without using wavelength multiplexing, i.e., ${N}_{w}=1$, approaches 0 as $N$ approaches $2{N}_{i}{N}_{o}\approx 8200$. This observation confirms the conclusion obtained in our previous work,^{69}^{,}^{70} i.e., a phase-only monochrome diffractive network requires at least $2{N}_{i}{N}_{o}$ diffractive neurons to approximate a target complex-valued linear transformation with negligible error. On the other hand, for the wavelength-multiplexed diffractive models with ${N}_{w}$ different wavelength channels that are trained to approximate ${N}_{w}$ unique linear transforms, we see in Fig. 2 that the approximation errors approach 0 as $N$ approaches $2{N}_{w}{N}_{i}{N}_{o}$. This finding indicates that compared to a baseline monochrome diffractive model that can only perform a single transform, performing multiple distinct transforms using wavelength multiplexing within a single diffractive network requires its number of trainable neurons $N$ to be increased by ${N}_{w}$-fold. This conclusion is further supported by the results of the other two performance metrics, CosSim and ${\mathrm{MSE}}_{\text{O}\text{utput}}$, as shown in Figs. 2(c) and 2(d): as $N$ approaches $2{N}_{w}{N}_{i}{N}_{o}$, CosSim and ${\mathrm{MSE}}_{\text{O}\text{utput}}$ of the wavelength-multiplexed diffractive models approach 1 and 0, respectively. To reveal the linear transformation performance of the individual wavelength channels in our wavelength-multiplexed diffractive processors, in Fig. 3, we show the channel-wise output field errors $({\mathrm{MSE}}_{\text{O}\text{utput}})$ of the wavelength-multiplexed diffractive networks with ${N}_{w}=2$, 4, 8, 16, and 32 and $N=2{N}_{w}{N}_{i}{N}_{o}$. Figure 3 indicates that the ${\mathrm{MSE}}_{\text{O}\text{utput}}$ of these individual channels are very close to each other in all the designs with different ${N}_{w}$, demonstrating no significant performance bias toward any specific wavelength channel or target transform. For comparison, we also show in Fig. S2 in the Supplementary Material, the resulting ${\mathrm{MSE}}_{\text{O}\text{utput}}$ of the diffractive model with ${N}_{w}=8$ and $N=2{N}_{w}{N}_{i}{N}_{o}=16{N}_{i}{N}_{o}$ when our channel balancing training strategy with adaptive weights was not used (see Sec. 4). There appears to be a large variation at the output field errors among the different wavelength channels if adaptive weights were not used during the training; in fact, the channels assigned to longer wavelengths tend to show much inferior transformation performance, which highlights the significance of using our balancing strategy during the training process. Stated differently, unless a channel balancing strategy is employed during the training phase, longer wavelengths suffer from relatively lower spatial resolution and face increased all-optical transformation errors compared to the shorter wavelength channels. To visually demonstrate the success of our broadband diffractive system in performing a group of linear transformations using wavelength multiplexing, in Fig. 4, we show examples of the ground-truth transformation matrices (i.e., ${\mathit{A}}_{w}$) and their all-optical counterparts (i.e., ${\mathit{A}}_{w}^{\prime}$) resulting from the diffractive designs with ${N}_{w}=8$ and $N\in \{2{N}_{w}{N}_{i}{N}_{o}=16{N}_{i}{N}_{o}=64,800;4{N}_{w}{N}_{i}{N}_{o}=32{N}_{i}{N}_{o}=\mathrm{131,100}\}$. The amplitude and phase absolute errors between the two (${\mathit{A}}_{w}$ and ${\mathit{A}}_{w}^{\prime}$) are also reported in the same figure. Moreover, in Fig. 5 and Fig. S3 in the Supplementary Material, we present some exemplary complex-valued input–output optical fields from the same set of diffractive designs with $N=4{N}_{w}{N}_{i}{N}_{o}=\mathrm{131,100}$ and $N=2{N}_{w}{N}_{i}{N}_{o}=64,800$, respectively. These results, summarized in Figs. 4 and 5 and Fig. S3 in the Supplementary Material, reveal that, when $N\ge 2{N}_{w}{N}_{i}{N}_{o}$, the all-optical transformation matrices and the output complex fields of all the wavelength channels match their ground-truth targets very well with negligible error, which is also in line with our earlier observations in Fig. 2. 2.2.Limits of ${\mathit{N}}_{\mathit{w}}$: Scalability of Wavelength-Multiplexing in Diffractive NetworksWe have so far demonstrated that a single broadband diffractive network can be designed to simultaneously perform a group of ${N}_{w}$ arbitrary complex-valued linear transformations, with ${N}_{w}=2$, 4, 8, 16, and 32 (Figs. 2 and 3). Next, we explore the feasibility of implementing a significantly larger number of wavelength channels in our system to better understand the limits of ${N}_{w}$. Due to our limited computational resources available, to simulate the behavior of larger ${N}_{w}$ values, we selected ${N}_{i}={N}_{o}=5\times 5$ and ${N}_{w}\in \{1,2,4,8,16,32,64,128,\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}184\}$. Accordingly, we generated a new set of 184 different arbitrarily selected complex-valued matrices with dimensions of $25\times 25$, i.e., ${\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{184}$, as the target linear transformations to be all-optically implemented. The cosine similarity values between these randomly generated matrices are reported in Fig. S1b in the Supplementary Material, confirming that they are all very close to 0. We also created training, validation, and testing data sets based on these new target transformation matrices following the same approach as in the previous section: for each transformation matrix, we randomly generated 55,000, 5000, and 10,000 field samples for the training, validation, and testing data sets, respectively. Then using the training field data sets, we trained broadband diffractive designs with ${N}_{w}$ different wavelength channels, where the ${N}_{w}$ target transforms were taken from the first ${N}_{w}$ matrices in the randomly generated set $\{{\mathit{A}}_{1},{\mathit{A}}_{2},\dots ,{\mathit{A}}_{184}\}$. For each ${N}_{w}$ choice, we also trained diffractive models with different numbers of diffractive neurons, including $N=1.5{N}_{w}{N}_{i}{N}_{o}$, $N=2{N}_{w}{N}_{i}{N}_{o}$, and $N=3{N}_{w}{N}_{i}{N}_{o}$. The all-optical transformation performance metrics of the resulting diffractive networks on the testing data sets are shown in Fig. 6 as a function of ${N}_{w}$. Figures 6(a)–6(c) reveal that the all-optical transformations of the diffractive designs with different $N$ show some increased error as ${N}_{w}$ increases. For the diffractive models with $N=3{N}_{w}{N}_{i}{N}_{o}$, the all-optical transformation errors $({\mathrm{MSE}}_{\text{T}\text{ransformation}})$ at smaller ${N}_{w}$ appear to be extremely small and do not exhibit the same performance degradation with increasing ${N}_{w}$; only after ${N}_{w}>10$ we see an error increase in the all-optical transformations for $N=3{N}_{w}{N}_{i}{N}_{o}$. By comparing the linear transformation performance of the models with different $N$, Fig. 6 clearly reveals that adding more diffractive neurons to a broadband diffractive network design can greatly improve its transformation performance, which is especially important to operate at a large ${N}_{w}$. By having a linear fit to the data points shown in Figs. 6(a) and 6(c), we can extrapolate to larger ${N}_{w}$ values and predict an all-optical transformation error bound as a function of ${N}_{w}$. With these fitted (dashed) lines shown in Figs. 6(a) and 6(c), we get a coarse prediction of the linear transformation performance of a broadband diffractive model with a significantly larger number of wavelength channels ${N}_{w}$ that is challenging to simulate due to our limited computer memory and speed. Interestingly, these three fitted lines (corresponding to diffractive designs with $N=1.5{N}_{w}{N}_{i}{N}_{o}$, $N=2{N}_{w}{N}_{i}{N}_{o}$, and $N=3{N}_{w}{N}_{i}{N}_{o}$) intersect with each other at a point around ${N}_{w}=\mathrm{10,000}$ with an ${\mathrm{MSE}}_{\text{T}\text{ransformation}}$ of $\sim 0.2$ and an ${\mathrm{MSE}}_{\text{O}\text{utput}}$ of $\sim 0.03$. This level of transformation error coincides with the error levels observed at the beginning of our training, implying that a broadband diffractive model with ${N}_{w}=\sim \mathrm{10,000}$, even after training, would only exhibit a performance level comparable to an untrained model. These analyses indicate that, for a broadband diffractive network trained with $N\le 3{N}_{w}{N}_{i}{N}_{o}$ and a training data set of 55,000 optical field pairs, there is an empirical multiplexing upper bound of ${N}_{w}=\sim \mathrm{10,000}$. However, before reaching this ${N}_{w}=\sim \mathrm{10,000}$ ultimate limit discussed above, practically the desired level of approximation accuracy will set the actual limit of ${N}_{w}$. For example, based on visual inspection and the calculated peak signal-to-noise ratio (PSNR) values, one can empirically choose a blind testing error of ${\mathrm{MSE}}_{\text{O}\text{utput}}\sim {10}^{-3}$ as a threshold for the diffractive network’s all-optical approximation error; this threshold corresponds to a mean PSNR value of $\sim 20\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{dB}$, calculated for the diffractive network output fields against their ground truth (see Fig. S4 in the Supplementary Material). We marked this ${\mathrm{MSE}}_{\text{O}\text{utput}}$-based performance threshold in Fig. 6(c) using a black dashed line, which also corresponds to a transformation error $({\mathrm{MSE}}_{\text{T}\text{ransformation}})$ of $\sim 9\times {10}^{-3}$, which was also marked in Fig. 6(a) with a black dashed line. Based on these empirical performance thresholds set by ${\mathrm{MSE}}_{\text{Output}}\approx {10}^{-3}$ and $\mathrm{PSNR}\approx 20\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{dB}$, we can infer that a broadband diffractive processor with $N=3{N}_{w}{N}_{i}{N}_{o}$ can accommodate up to ${N}_{w}\sim 2000$ wavelength channels, where $\sim 2000$ different linear transformations can be performed through a single broadband diffractive processor within the performance bounds shown in Figs. 6(a) and 6(c) (see the purple dashed lines). The same analysis reveals a reduced upper bound of ${N}_{w}\sim 600$ for the diffractive network designs with $N=2{N}_{w}{N}_{i}{N}_{o}$ (see the green dashed lines). 2.3.Impact of Material Dispersion and Losses on Wavelength-Multiplexed Diffractive NetworksIn the previous section, we showed that a broadband diffractive processor can be designed to implement $>180$ different target linear transforms simultaneously, and this number can be further extended to ${N}_{w}\sim 2000$ based on an all-optical approximation error threshold of ${\mathrm{MSE}}_{\text{O}\text{utput}}\approx {10}^{-3}$. In this section, we provide additional analyses on material-related factors that have an impact on the accuracy of wavelength-multiplexed computing through broadband diffractive networks. For example, the selection of materials with different dispersion properties (i.e., the real and imaginary parts of the refractive index as a function of the wavelength) will impact the light–matter interactions at different illumination wavelengths. To numerically explore the impact of material dispersion and related optical losses, we took the broadband diffractive network design shown in Fig. 6 with ${N}_{w}=128$ and $N=3{N}_{w}{N}_{i}{N}_{o}$ and retrained it using different materials. The first material we selected is a lossy polymer that is widely employed as a 3D printing material; this material was used to fabricate diffractive networks that operate at the terahertz part of the spectrum.^{52}^{,}^{66}^{,}^{67} The dispersion curves of this lossy material are shown in Fig. S5a in the Supplementary Material, which were also used in the design of the diffractive networks reported in the previous sections (with ${\lambda}_{m}=0.8\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$). As a second material choice for comparison, we selected a lossless dielectric material, for which we took N-BK7 glass as an example and used its dispersion to simulate our wavelength-multiplexed diffractive processor design at the visible wavelengths with ${\lambda}_{m}=530\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{nm}$; the dispersion curves of this material are reported in Fig. S5b in the Supplementary Material. As a third material choice for comparison, we considered a hypothetical scenario where the material of the diffractive layers had a flat dispersion at around ${\lambda}_{\mathrm{m}}=0.8\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, with no absorption and a constant refractive index ($\sim 1.72$) across all the selected wavelength channels of interest; see the refractive index curve of this “dispersion-free” material in Fig. S5c in the Supplementary Material. After the training of the diffractive network models using these different materials selected for comparison, we summarized their all-optical linear transformation performance in Figs. 7(a)–7(c) (see the purple bars). These results reveal that all three diffractive models with different material choices achieved negligible all-optical transformation errors, regardless of their dispersion characteristics. This confirms the feasibility of extending our wavelength-multiplexed diffractive processor designs to other spectral bands with vastly different material dispersion features. In addition to the all-optical transformation accuracy, the output diffractive efficiency ($\eta $) of these diffractive network models is also practically important. As shown in Fig. 7(d), due to the absorption by the layers, the diffractive network model using the lossy polymer material presents a very poor output diffraction efficiency $\eta $ compared to the other two diffractive models that used lossless materials. In addition to the absorption of light through the diffractive layers, a wavelength-multiplexed diffractive network also suffers from optical losses due to the propagating waves that leak out of the diffractive processor volume. This second source of optical loss within a diffractive network can be strongly mitigated through the incorporation of diffraction efficiency-related penalty terms^{52}^{,}^{66}^{,}^{67}^{,}^{69} into the training loss function (see Sec. 4 for details). The results of using such a diffraction-efficiency-related penalty term during training are presented in Figs. 7(a)–7(d) (yellow bars), which indicate that the output diffraction efficiencies of the corresponding models were improved by $>589$ to 1479-fold compared to their counterparts that were trained without using such a penalty term [see Fig. 7(d)]. We also show in Figs. 7(e) and 7(f), the output diffraction efficiencies of the individual wavelength channels trained without and with the diffraction-efficiency penalty term, respectively. These results also revealed that the diffraction-efficiency-related penalty term used during training not only improved the overall output efficiency of the diffractive processor design but also helped to mitigate the imbalance of diffraction efficiencies among different wavelength channels [see Figs. 7(e) and 7(f)]. These improvements also come at an expense; as shown in Figs. 7(a)–7(c), there is some degradation in the all-optical transformation performance of the diffractive networks that are trained with a diffraction-efficiency-related penalty term. However, this relative degradation in the all-optical transformation performance is still acceptable, since a cosine similarity value of $>0.996$ to 0.998 is maintained in each case [see Fig. 7(b), yellow bars]. 2.4.Impact of Limited Bit Depth on the Accuracy of Wavelength-Multiplexed Diffractive NetworksThe bit depth of a broadband diffractive network refers to the finite number of thickness levels that each diffractive neuron can have on top of a common base thickness of each diffractive layer. For example, in a broadband diffractive network with a bit depth of 8, its diffractive neurons will be trained to have at most ${2}^{8}=256$ different thickness values that are distributed between a predetermined minimum thickness and a maximum thickness value. To mechanically support each diffractive layer, the minimum thickness is always positive, acting as the base thickness of each layer. To analyze the impact of this bit depth on the linear transformation performance and accuracy of our wavelength-multiplexed diffractive networks, we took the ${N}_{w}=184$ channel diffractive design reported in the previous sections (trained using a data format with 32-bit depth) and retrained it from scratch under different bit depths, including 4, 8, and 12. Based on the same test data set, the all-optical linear transformation performance metrics of the resulting diffractive networks are reported in Fig. 8 as a function of $N$. Figure 8 reveals that a 12-bit depth is practically identical to using a 32-bit depth in terms of the all-optical transformation accuracy that can be achieved for the ${N}_{w}=184$ target linear transformations. Furthermore, a bit depth of 8 can also be used for a broadband diffractive network design to maintain its all-optical transformation performance with a relatively small error increase, which can be compensated for with an increase in $N$, as illustrated in Fig. 8. These observations from Fig. 8 highlight (1) the importance of having a sufficient bit depth in the design and fabrication of a broadband diffractive processor and (2) the importance of $N$ as a way to boost the all-optical transformation performance under a limited diffractive neuron bit depth. 2.5.Impact of Wavelength Precision or Jitter on the Accuracy of Wavelength-Multiplexed Diffractive NetworksAnother possible factor that may cause systematic errors in our framework is the wavelength precision or jitter. To analyze the wavelength encoding related errors, we used the four-channel wavelength-multiplexed diffractive network model with $N\approx 2{N}_{w}{N}_{i}{N}_{o}=8{N}_{i}{N}_{o}$ and ${N}_{i}={N}_{o}={8}^{2}$ that was presented in Fig. 3(b). We deliberately shifted the illumination wavelength used for each encoding channel away from the preselected wavelength used during the training (i.e., ${\lambda}_{1}=0.9125{\lambda}_{m}$, ${\lambda}_{2}=0.9708{\lambda}_{m}$, ${\lambda}_{3}=1.0292{\lambda}_{m}$, and ${\lambda}_{4}=1.0875{\lambda}_{m}$). The resulting linear transformation performance of the ${N}_{w}=4$ channels using different performance metrics is summarized in Figs. 9(a)–9(c) as a function of the illumination wavelength. All of these results in Fig. 9 show that as the illumination wavelengths used for each encoding channel gradually deviate from their designed/assigned wavelengths (used during the training of the wavelength-multiplexed diffractive network), their all-optical transformation accuracy begins to degrade. To shed more light on this, we used the previous performance threshold based on ${\mathrm{MSE}}_{\text{O}\text{utput}}\approx {10}^{-3}$ as an empirical criterion to estimate the tolerable range of illumination wavelength errors, which revealed an acceptable bandwidth of $\sim 0.002{\lambda}_{m}$ for each one of the encoding wavelength channels. Stated differently, when a given illumination wavelength is within $\pm \sim 0.001{\lambda}_{m}$ of the corresponding preselected wavelength assigned for that spectral channel, the degradation of the linear transformation accuracy at the output of the wavelength-multiplexed diffractive network will satisfy ${\mathrm{MSE}}_{\text{O}\text{utput}}\le {10}^{-3}$. In practical applications, this level of spectral precision can be routinely achieved by using high-performance wavelength scanning sources^{74}^{,}^{75} (e.g., swept-source lasers) or narrow passband thin-film filters. 2.6.Permutation-Based Encoding and Decoding Using Wavelength-Multiplexed Diffractive NetworksSo far, we have demonstrated the design of wavelength-multiplexed diffractive processors that can allow a massive number of unique complex-valued linear transformations to be computed, all in parallel, within a single diffractive optical network. To exemplify some of the potential applications of this broadband diffractive processor design, here we demonstrate the permutation matrix-based optical transforms, which have significance for telecommunications (e.g., channel routing and interconnects), information security, and data processing (see Fig. 10). Similar to the approaches introduced earlier, we randomly generated eight permutation matrices, ${\mathit{P}}_{1},{\mathit{P}}_{2},\dots ,{\mathit{P}}_{8}$ [see Fig. 10(b)] and trained a wavelength-multiplexed diffractive network with ${N}_{w}=8$ and $N=2{N}_{w}{N}_{i}{N}_{o}=16{N}_{i}{N}_{o}=64,800$; this architecture has the same configuration as the one shown in Fig. 3(c), and Fig. 4 (middle column), except it uses these new permutation matrices as the target transforms. After its training, in Fig. 10(a), we show examples of permutation-based encoding of input images using the trained broadband diffractive network. After being all-optically processed by our wavelength-multiplexed diffractive network design, all the input images (${\mathit{i}}_{\mathrm{w}}$) are simultaneously permuted (encoded) according to the permutation matrices assigned to the corresponding wavelength channels, resulting in the output fields ${\mathit{o}}_{\mathrm{w}}^{\prime}$, which very well match their ground truth ${\mathit{o}}_{\mathrm{w}}$ [see Fig. 10(a)]. Stated differently, the trained wavelength-multiplexed diffractive processor can successfully synthesize the correct output field ${\mathit{o}}_{w}={\mathit{P}}_{w}{\mathit{i}}_{w}$ for all the possible input fields ${\mathit{i}}_{w}$, since it presents an all-optical approximation of ${\mathit{P}}_{w}$ for $w\in \{\mathrm{1,2},\dots ,8\}$. Similarly, we present in Fig. S6 in the Supplementary Material that the same wavelength-multiplexed permutation transformation network can be used to all-optically decode the encoded/permuted patterns. In this case, the input encoded fields are generated by transforming (permuting) the same input images using the inverse of the permutation matrices ${\mathit{P}}_{1},{\mathit{P}}_{2},\dots ,{\mathit{P}}_{8}$. The results shown in Fig. S6 in the Supplementary Material indicate that the wavelength-multiplexed diffractive network can all-optically perform simultaneous decoding of all the input images, very well matching their ground truth. 2.7.Experimental Validation of a Wavelength-Multiplexed Diffractive NetworkNext, we performed a proof-of-concept experimental validation of our diffractive network using wavelength-multiplexed permutation operations. With a frequency-tunable continuous-wave terahertz (THz) setup shown in Fig. 11(a) (see Sec. 4 for its implementation details), we tested a wavelength-multiplexed diffractive network design with ${N}_{w}=2$ and ${N}_{i}={N}_{o}={3}^{2}$, where the two wavelength channels were chosen as ${\lambda}_{1}=0.667\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ and ${\lambda}_{2}=0.698\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$. Each one of these two wavelength channels in this experimental design is assigned to a unique, arbitrarily generated target permutation matrix (${\mathit{P}}_{1}$ and ${\mathit{P}}_{2}$, see Fig. S7 in the Supplementary Material), such that any spatially structured pattern at the input FOV can be all-optically permuted by the diffractive optical network to form different desired patterns at the output FOV, performing ${\mathit{P}}_{1}$ and ${\mathit{P}}_{2}$ under ${\lambda}_{1}$ and ${\lambda}_{2}$ illumination, respectively. For this, we used a diffractive network architecture with three diffractive layers, with each layer having $120\times 120$ diffractive features, each with a lateral size of 0.4 mm ($\sim 0.59{\lambda}_{m}$). The axial spacing between any two of the adjacent layers (including the input/output planes) in this design was set as 20 mm ($\sim 29.3{\lambda}_{m}$). During the training process, a total of 55,000 randomly generated input–output field pairs corresponding to the target permutation matrices (${\mathit{P}}_{1}$ and ${\mathit{P}}_{2}$) were used to update the thickness values of these diffractive layers. After the training converged, the resulting diffractive layers were fabricated using a 3D printer and mechanically assembled to form a physical wavelength-multiplexed diffractive optical permutation processor, as shown in Figs. 11(b)–11(d). To experimentally test the performance of this 3D-fabricated wavelength-multiplexed diffractive network, different input patterns from the blind testing set (never used in training) were also 3D-printed and used as the input test objects. The experimental test results are reported in Fig. 11(e), revealing that the output patterns for all these input patterns show a good agreement with their numerically simulated counterparts and the ground-truth images. The success of these experimental results further confirms the feasibility of our wavelength-multiplexed diffractive optical transformation networks. 3.DiscussionWe demonstrated wavelength-multiplexed diffractive network designs that can perform massively parallel universal linear transformations through a single diffractive processor. We also quantified the limits of ${N}_{w}$ and the impact of material dispersion, bit depth, and wavelength precision/jitter on the all-optical transformation performance of broadband diffractive networks. In addition to these, other factors may limit the performance of broadband diffractive processors, including the lateral and axial misalignments of diffractive layers, surface reflections, and other imperfections introduced during the fabrication. To mitigate some of these practical issues, various approaches, such as high-precision lithography and antireflection coatings can be utilized in the fabrication process of a diffractive network. As demonstrated in our previous work,^{50}^{,}^{52}^{,}^{61} it is also possible to mitigate the performance degradation resulting from some of these experimental factors by incorporating them as random errors into the physical forward model used during the training process, which is referred to as “vaccination” of the diffractive network. The reported wavelength-multiplexed diffractive processor represents a milestone in expanding the parallelism of diffractive all-optical computing, simultaneously covering a large group of complex-valued linear transformations. Compared to our previous work,^{70} where a monochromatic diffractive optical network was integrated with polarization-sensitive elements to achieve multiplexing of four independent linear transformations, the multiplexing factor $({N}_{w})$ of a wavelength-multiplexed diffractive network is significantly increased to more than 180, and can further reach ${N}_{w}\sim 2000$, revealing a major improvement in the all-optical processing throughput. Moreover, the physical architecture of this wavelength-multiplexed computing framework is also relatively simple, since it does not rely on any additional optical modulation elements, e.g., spectral filters; it solely utilizes the different phase modulation values of the same diffractive layers at different wavelengths of light, also being compatible with different materials with various dispersion properties (including flat dispersion, as illustrated in Fig. 7). One could perhaps argue that, equivalent to a wavelength-multiplexed diffractive network that uses $N$ trainable diffractive features to compute ${N}_{w}$ independent target linear transformations, we could utilize a set of ${N}_{w}$ separately optimized monochromatic diffractive networks, each assigned to perform one of the ${N}_{w}$ target linear transforms using $N/{N}_{w}$ diffractive features. However, such a multipath design involving ${N}_{w}$ different monochromatic diffractive networks (one for each target transformation) would require bulky optical routing for fan-in/fan-out, which would introduce additional insertion losses, noise, and misalignment errors into the system, thus hurting the energy efficiency, performance, and compactness of the optical processor. Considering the fact that we covered ${N}_{w}>180$ in this work, such an approach of using ${N}_{w}$ separate monochromatic diffractive networks is not a feasible strategy that can compete with a wavelength-multiplexed design. Furthermore, if additional multiplexing schemes other than the wavelength multiplexing reported here were to be used, such as temporal multiplexing, switching between different diffractive networks, they would also require the use of additional optoelectronic control elements, further increasing the hardware complexity of the system, which would not be feasible for a large ${N}_{w}$. It is worth further emphasizing that even if multiple separately optimized monochromatic diffractive networks could be trained to individually perform different target linear transforms at different wavelengths, it is not possible to directly combine the converged/optimized layers of these diffractive networks to match the broadband operation of the wavelength-multiplexed diffractive network presented here. Since these monochromatic networks are individually trained using only a single illumination wavelength, the optimized modulation of each wavelength under broadband illumination would produce destructive patterns to other wavelengths, and their transformation accuracies would be collectively hampered. This, once again, highlights the significance of our wavelength multiplexing scheme: a wavelength-multiplexed diffractive optical network can be realized through the engineering of the surface profiles of dielectric diffractive layers with arbitrary dispersion properties, whereas these profiles should be designed by simultaneously taking into account all the ${N}_{w}$ wavelength channels, with phase modulation values that are mutually coupled to each other. To the best of our knowledge, there has not been a demonstration of a design for the all-optical implementation of a complex-valued, arbitrary linear transformation using metasurfaces or metamaterials. In principle, having different diffractive metaunits placed on the same substrate to perform different transformations at different wavelengths could be attempted as an alternative approach to what we presented in this paper. However, such an approach would face severe challenges since (1) at large spectral multiplexing factors (${N}_{w}\gg 1$) shown in this work, the lateral period for each spectral metadesign will substantially increase per substrate, lowering the accuracy of each transformation; (2) at each illumination wavelength, the other metaunits designed for (assigned to) the other spectral components, will also introduce “cross-talk fields” that will severely contaminate the desired responses at each wavelength and cannot be neglected since ${N}_{w}\gg 1$; (3) the phase responses of the spectrally encoded metaunits, in general, cover a small angular range, leading to low numerical aperture (NA) solutions compared to the diffractive solutions reported in this work, where NA = 1 (in air); the low NA of metaunits fundamentally limits the space-bandwidth product of each transformation channel; and (4) if multiple layers of metasurfaces are used in a given design, all of these aforementioned sources of errors associated with spectral metaunits will accumulate and get amplified through the subsequent field propagation in a cascaded manner, causing severe degradations to the final output fields, compared to the desired fields. Perhaps due to these significant challenges outlined here, metasurface or metamaterial-based diffractive designs have not yet been reported as a solution to perform universal linear transformations—neither an arbitrary complex-valued linear transformation nor a group of linear transformations through some form of multiplexing. As we have shown in Sec. 2, a diffractive neuron number of $N\ge 2{N}_{w}{N}_{i}{N}_{o}$ is required for a wavelength-multiplexed diffractive network to successfully implement ${N}_{w}$ different complex-valued linear transforms. Compared to the previous complex-valued monochrome (${N}_{w}=1$) diffractive designs,^{69} the additional factor of 2 in $N$ results from the fact that the only trainable degrees of freedom for a broadband wavelength-multiplexed diffractive design are the thickness values of the diffractive neurons, whereas the ${N}_{w}$ different target transformations are all complex-valued. Stated differently, the resulting modulation values of different wavelengths through each diffractive neuron are mutually coupled through the dispersion of the material and depend on the neuron thickness. Finally, we would like to emphasize that this presented framework can operate at various parts of the electromagnetic spectrum, including the visible band, so that the set of wavelength channels used to perform the transformation multiplexing can match with the light source and/or the spectral signals emitted from or reflected by the objects. In practice, this massively parallel linear transformation capability can be utilized in an optical processor to perform distinct statistical inference tasks using different wavelength channels, bringing in additional throughput and parallelism to optical computing. This wavelength-multiplexed diffractive network design might also inspire the development of new multicolor and hyperspectral machine-vision systems, where all-optical information processing is performed simultaneously based on both the spatial and spectral features of the input objects. The resulting hyperspectral or multispectral diffractive output fields can enable new optical visual processing systems that can identify or encode input objects with unique spectral properties. As another possibility, novel multispectral display systems can be created using these wavelength-multiplexed diffractive output fields to reconstruct spectroscopic images or light fields from compressed or distorted input spectral signals.^{62} All these possibilities enabled by wavelength-multiplexed diffractive optical processors can inspire numerous applications in biomedical imaging, remote sensing, analytical chemistry, material science, and many other fields. 4.Appendix: Materials and Methods4.1.Forward Model of the Broadband Diffractive Neural NetworkA wavelength-multiplexed diffractive network consists of successive diffractive layers that collectively modulate the incoming broadband optical fields. In the forward model of our numerical simulations, the diffractive layers are assumed to be thin optical modulation elements, where the ${m}^{\text{th}}$ feature on the ${k}^{\text{th}}$ layer at a spatial location $({x}_{m},{y}_{m},{z}_{m})$ represents a wavelength-dependent complex-valued transmission coefficient ${t}^{k}$ given by Eq. (1)$${t}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )={a}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )\mathrm{exp}(j{\varphi}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )),$$Eq. (2)$${f}_{m}^{k}(x,y,z)=\frac{z-{z}_{i}}{{r}^{2}}(\frac{1}{2\pi r}+\frac{1}{j\lambda})\mathrm{exp}\left(\frac{j2\pi r}{\lambda}\right),$$Eq. (3)$${E}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )={t}^{k}({x}_{m},{y}_{m},{z}_{m})\xb7\sum _{n\in S}{E}^{k-1}({x}_{n},{y}_{n},{z}_{n},\lambda )\xb7{f}_{m}^{k-1}({x}_{m},{y}_{m},{z}_{m}),$$For the diffractive models used for numerical analyses, we chose ${\lambda}_{m}/2$ as the smallest sampling period for the simulation of the complex optical fields and also used ${\lambda}_{m}/2$ as the smallest feature size of the diffractive layers. In the input and output FOVs, a $4\times 4$ binning is performed on the simulated optical fields, resulting in a pixel size of $2{\lambda}_{m}$ for the input/output fields. The axial distance ($d$) between the successive layers (including the diffractive layers and the input/output planes) in our diffractive processor designs is empirically selected as $d=0.5{D}_{\text{layer}}$, where ${D}_{\text{layer}}$ represents the lateral size of each diffractive layer. The diffractive thickness value $h$ of each neuron of a diffractive layer is composed of two parts ${h}_{\text{learnable}}$ and ${h}_{\text{base}}$ as follows: where ${h}_{\text{learnable}}$ denotes the learnable thickness parameters of each diffractive feature and is confined between ${h}_{\mathrm{min}}=0$ and ${h}_{\mathrm{max}}=1.25{\lambda}_{m}$ for all the diffractive models used for numerical analyses in this paper. When a modulation with $q$-bit depth is applied to the diffractive model, ${h}_{\text{learnable}}$ will be rounded to the nearest number that corresponds to one of ${2}^{q}$ different equally spaced levels within the range of [0, ${h}_{\mathrm{learnable}}$]. The additional base thickness ${h}_{\text{base}}$ is a constant, which is chosen as $0.25{\lambda}_{m}$ to serve as substrate support for the diffractive neurons. To achieve the constraint applied to ${h}_{\text{learnable}}$, an associated latent trainable variable ${h}_{v}$ was defined using the following analytical form:Note that before the training starts ${h}_{v}$ values of all the diffractive neurons were randomly initialized with a normal distribution (a mean value of 0 and a standard deviation of 1). Based on these definitions, the amplitude and phase components of the complex transmittance of ${m}^{\text{th}}$, i.e., ${a}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )$ and ${\varphi}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )$, can be written as a function of the thickness of each neuron ${h}_{m}$ and the incident wavelength $\lambda $: Eq. (6)$${a}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )=\mathrm{exp}(-\frac{2\pi \kappa (\lambda ){h}_{m}^{k}}{\lambda}),$$Eq. (7)$${\varphi}^{k}({x}_{m},{y}_{m},{z}_{m},\lambda )=(n(\lambda )-{n}_{\text{air}})\frac{2\pi {h}_{m}^{k}}{\lambda},$$4.2.Preparation of the Linear Transformation Data SetsIn this paper, the input and output FOVs of the diffractive networks are assumed to have the same size, which is set as $8\times 8$, $5\times 5$, or $3\times 3\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ based on the assigned linear transformation tasks, i.e., ${\mathit{i}}_{w},{\mathit{o}}_{w}\in {\mathbb{C}}^{8\times 8}$, ${\mathbb{C}}^{5\times 5}$, or ${\mathbb{C}}^{3\times 3}$ ($w\in \{1,2,\dots ,{N}_{w}\}$). Accordingly, the size of the target complex-valued transformation matrices ${\mathit{A}}_{w}$ is equal to $64\times 64$, $25\times 25$, or $9\times 9$, respectively, i.e., ${\mathit{A}}_{w}\in {\mathbb{C}}^{64\times 64}$ ($w\in \{1,\text{\hspace{0.17em}}2,\dots ,32\}$), ${\mathit{A}}_{w}\in {\mathbb{C}}^{25\times 25}$ $(w\in \{1,\text{\hspace{0.17em}}2,\dots ,184\})$, or ${\mathit{A}}_{w}\in {\mathbb{C}}^{9\times 9}$ ($w\in \{1,2\}$). For arbitrary linear transformations, the amplitude and phase components of all these target matrices ${\mathit{A}}_{w}$ were generated with a uniform ($U$) distribution of $U[0,1]$ and $U[0,2\pi ]$, respectively, using the pseudorandom number generation function random.uniform() built-in NumPy. For the arbitrarily selected permutation transformations, all the target matrices ${\mathit{A}}_{w}$ (also denoted as ${\mathit{P}}_{w}$) were generated by permuting an identity matrix of the same size as ${\mathit{P}}_{w}$ using the pseudorandom matrix permutation function random.permutation() built-in NumPy. Different random seeds were used to generate these transformation matrices to ensure they were unique. For training a broadband diffractive network with ${N}_{w}$ wavelength channels, the amplitude and phase components of the input fields ${\mathit{i}}_{w}$ ($w\in \{1,2,\dots ,{N}_{w}\}$) were randomly generated with a uniform ($U$) distribution of $U[0,1]$ and $U[0,2\pi ]$, respectively. The ground-truth (target) fields ${\mathit{o}}_{w}$ ($w\in \{1,2,\dots ,{N}_{w}\}$) were generated by calculating ${\mathit{o}}_{w}={\mathit{A}}_{w}{\mathit{i}}_{w}$. For each ${\mathit{A}}_{w}$ ($w\in \{1,2,\dots ,{N}_{w}\}$), we generated a total of 70,000 input/output complex optical fields to form a data set, which was then divided into three parts: training, validation, and testing, each containing 55,000, 5000, and 10,000 complex-valued optical field pairs, respectively. 4.3.Training Loss FunctionFor each wavelength channel, the normalized MSE loss function is defined as Eq. (8)$${\mathcal{L}}_{\mathrm{MSE},w}=E[\frac{1}{{N}_{o}}\sum _{n=1}^{{N}_{o}}|\hat{{\mathit{o}}_{w}}[n]-\hat{{\mathit{o}}_{w}^{\prime}}[n]{|}^{2}]\phantom{\rule{0ex}{0ex}}=E[\frac{1}{{N}_{o}}\sum _{n=1}^{{N}_{o}}{|{\sigma}_{w}{\mathit{o}}_{w}[n]-{\sigma}_{w}^{\prime}{\mathit{o}}_{w}^{\prime}[n]|}^{2}],$$Eq. (10)$${\sigma}_{w}^{\prime}=\frac{{\sum}_{n=1}^{{N}_{o}}{\sigma}_{w}{\mathit{o}}_{w}[n]{\mathit{o}}_{w}^{\prime *}[n]}{{\sum}_{n=1}^{{N}_{o}}|{\mathit{o}}_{w}^{\prime}[n]{|}^{2}}.$$During the training of each broadband diffractive network, all the wavelength channels are simultaneously simulated, and the training data are fed into the channels at the same time. The wavelength-multiplexed diffractive network is trained based on the loss averaged across different wavelength channels. The total loss function $\mathcal{L}$ that we used can be written as where ${\alpha}_{w}$ is the adaptive spectral weight coefficient applied to the loss for the ${w}^{\text{th}}$ wavelength channel, which was used to balance the performance achieved by different wavelength channels during the optimization process. The initial values of ${\alpha}_{w}$ for all the wavelength channels are set as 1. After the optimization begins, ${\alpha}_{w}$ is adaptively updated after each training step using the following equation:Eq. (12)$${\alpha}_{w}\leftarrow \mathrm{max}(0.1\times ({\mathcal{L}}_{\mathrm{MSE},w}-{\mathcal{L}}_{\mathrm{MSE},{w}_{\mathrm{ref}}})+{\alpha}_{w},0),$$In order to increase the output diffraction efficiencies of the diffractive networks, we incorporated an additional efficiency penalty term to the loss function of Eq. (11): Eq. (13)$$L=\frac{1}{{N}_{w}}\sum _{w=1}^{{N}_{w}}({\alpha}_{w}{\mathcal{L}}_{\mathrm{MSE},w}+\beta {\mathcal{L}}_{\mathrm{eff},w}),$$Eq. (14)$${\mathcal{L}}_{\mathrm{eff},w}=\{\begin{array}{ll}{\eta}_{\mathrm{th}}-{\eta}_{w},& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\eta}_{\mathrm{th}}\ge {\eta}_{w}\\ 0,& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\eta}_{\mathrm{th}}<{\eta}_{w}\end{array},$$4.4.Performance Metrics Used for the Quantification of the All-Optical Transformation ErrorsTo quantitatively evaluate the transformation results of the wavelength-multiplexed diffractive networks, four different performance metrics were calculated per wavelength channel of the diffractive designs using the blind testing data set: (1) the normalized transformation MSE (${\mathrm{MSE}}_{\text{T}\text{ransformation}}$), (2) the cosine similarity ($\mathrm{CosSim}$) between the all-optical transforms and the target transforms, (3) the normalized MSE between the diffractive network output fields and their ground truth (${\mathrm{MSE}}_{\text{O}\text{utput}}$), and (4) the output diffraction efficiency [Eq. (15)]. The transformation error for the ${w}^{\text{th}}$ wavelength channel of the wavelength-multiplexed diffractive network ${\mathrm{MSE}}_{\text{T}\text{ransformation},w}$ is defined as Eq. (16)$${\mathrm{MSE}}_{\text{T}\text{ransformation},w}=\frac{1}{{N}_{i}{N}_{o}}\sum _{n=1}^{{N}_{i}{N}_{o}}{|{\mathit{a}}_{w}[n]-{m}_{w}{\mathit{a}}_{w}^{\prime}[n]|}^{2}\phantom{\rule{0ex}{0ex}}=\frac{1}{{N}_{i}{N}_{o}}\sum _{n=1}^{{N}_{i}{N}_{o}}|{\mathit{a}}_{w}[n]-\hat{{\mathit{a}}_{w}^{\prime}}[n]{|}^{2},$$Eq. (17)$$\begin{array}{c}{m}_{w}=\frac{{\sum}_{n=1}^{{N}_{i}{N}_{o}}{\mathit{a}}_{w}[n]{\mathit{a}}_{w}^{\prime *}[n]}{{\sum}_{n=1}^{{N}_{i}{N}_{o}}|{\mathit{a}}_{w}^{\prime}[n]{|}^{2}}.\end{array}$$The cosine similarity between the all-optical diffractive transform and its target (ground truth) for the ${w}^{\text{th}}$ wavelength channel ${\text{CosSim}}_{w}$ is defined as Eq. (18)$$\begin{array}{c}{\text{CosSim}}_{w}=\frac{|{\mathit{a}}_{w}^{H}{\hat{\mathit{a}}}_{w}^{\prime}|}{\sqrt{{\sum}_{n=1}^{{N}_{i}{N}_{o}}|{\mathit{a}}_{w}[n]{|}^{2}}\sqrt{{\sum}_{n=1}^{{N}_{i}{N}_{o}}{|\hat{{\mathit{a}}_{w}^{\prime}}[n]|}^{2}}}.\end{array}$$The normalized MSE between the diffractive network outputs and their ground truth for the ${w}^{\text{th}}$ wavelength channel ${\mathrm{MSE}}_{\text{O}\text{utput},w}$ is defined using the same formula as in Eq. (8), except that $E[\xb7]$ is calculated across the entire testing set. 4.5.Training-Related DetailsAll the diffractive optical networks used in this work were trained using PyTorch (v1.11.0, Meta Platforms Inc.). We selected AdamW optimizer^{76}^{,}^{77} for training all the models, and its parameters were taken as the default values and kept identical in each model. The batch size was set as 8. The learning rate, starting from an initial value of 0.001, was set to decay at a rate of 0.5 every 10 epochs, respectively. The training of the diffractive network models was performed with 50 epochs. The best models were selected based on the MSE loss calculated on the validation data set. For the training of our diffractive models, we used a workstation with a GeForce RTX 3090 graphical processing unit (Nvidia Inc.) and Intel^{®} Core™ i9-12900F central processing unit (Intel Inc.) and 64 GB of RAM, running Windows 11 operating system (Microsoft Inc.). The typical time required for training a wavelength-multiplexed diffractive network model with, e.g., ${N}_{w}=128$ and $N=1.5{N}_{w}{N}_{i}{N}_{o}$ is $\sim 50\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{h}$. 4.6.Experimental Terahertz SetupThe diffractive layers used in our experiments were fabricated using a 3D printer (PR110, CADworks3D). The input test objects and holders were also 3D-printed (Objet30 Pro, Stratasys). After the printing process, the input objects were coated with aluminum foil to define the light-blocking areas, leaving openings at specific positions to define the transmitted pixels of the input patterns. The designed holder was used to assemble the diffractive layers and objects to mechanically maintain their relative spatial positions in line with our numerical design. To test our fabricated wavelength-multiplexed diffractive network design, we adopted a THz continuous-wave scanning system, whose schematic is presented in Fig. 11(a). A WR2.2 modular amplifier/multiplier chain (AMC) followed by a compatible diagonal horn antenna (Virginia Diode Inc.) is used as the THz source. Each time, a 10-dBm RF input signal was set at 11.944 or 12.500 GHz (${f}_{\mathrm{RF}1}$) at the input of AMC and multiplied 36 times to generate output radiation at 450 or 430 GHz, respectively, which corresponds to the illumination wavelengths ${\lambda}_{1}=0.667\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ and ${\lambda}_{2}=0.698\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ used for the two wavelength channels. A 1-kHz square wave was also generated to modulate the AMC output for lock-in detection. By placing the wavelength-multiplexed diffractive network 600 mm away from the exit aperture of the THz source, an approximately uniform plane wave was created, impinging on the input FOV of the diffractive network. The intensity distribution within the output FOV of the diffractive network was scanned at a step size of 2 mm by a single-pixel mixer/AMC (Virginia Diode Inc.) detector, which was mounted on an $XY$ positioning stage formed by combining two linearly motorized stages (Thorlabs NRT100). For illumination at ${\lambda}_{1}=0.667\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ or ${\lambda}_{2}=0.698\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, a 10-dBm sinusoidal signal was also generated at 11.917 or 12.472 GHz (${f}_{\mathrm{RF}2}$), respectively, as a local oscillator and sent to the detector to downconvert the output signal to 1 GHz. After being amplified by a low-noise amplifier (Mini-Circuits ZRL-1150-LN+) with a gain of 80 dBm, the downconverted signal was filtered by a 1-GHz ($\pm 10\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{MHz}$) bandpass filter (KL Electronics 3C40-1000/T10-O/O) and attenuated by a tunable attenuator (HP 8495B) for linear calibration. This final signal was then measured by a low-noise power detector (Mini-Circuits ZX47-60), whose output voltage was read by a lock-in amplifier (Stanford Research SR830) using the 1-kHz square wave as the reference signal and calibrated to a linear scale. In our postprocessing, cropping and pixel binning were applied to each measurement of the intensity field to match the pixel size and position of the output FOV used in the design phase, resulting in the output measurement images shown in Fig. 11(e). AcknowledgmentsThe authors would like to acknowledge the US Air Force Office of Scientific Research funding (Grant No. FA9550-21-1-0324). A. O. conceived and initiated the research, J. L. and T. G. conducted the experiments and processed the resulting data. B. B. and Y. L. contributed to the PyTorch implementation of diffractive network simulations. All the authors contributed to the preparation of the paper. A. O. supervised the research. Data and Materials AvailabilityThe deep-learning models reported in this work used standard libraries and scripts that are publicly available in PyTorch. All the data and methods needed to evaluate the conclusions of this work are presented in the main text and Supplementary Material. Additional data can be requested from the corresponding author. ReferencesD. R. Solli and B. Jalali,
“Analog optical computing,”
Nat. Photonics, 9
(11), 704
–706 https://doi.org/10.1038/nphoton.2015.208 NPAHBY 1749-4885
(2015).
Google Scholar
G. Wetzstein et al.,
“Inference in artificial intelligence with deep optics and photonics,”
Nature, 588
(7836), 39
–47 https://doi.org/10.1038/s41586-020-2973-6
(2020).
Google Scholar
B. J. Shastri et al.,
“Photonics for artificial intelligence and neuromorphic computing,”
Nat. Photonics, 15
(2), 102
–114 https://doi.org/10.1038/s41566-020-00754-y NPAHBY 1749-4885
(2021).
Google Scholar
H. Zhou et al.,
“Photonic matrix multiplication lights up photonic accelerator and beyond,”
Light Sci. Appl., 11
(1), 30 https://doi.org/10.1038/s41377-022-00717-8
(2022).
Google Scholar
D. Mengu et al.,
“At the intersection of optics and deep learning: statistical inference, computing, and inverse design,”
Adv. Opt. Photonics, 14
(2), 209
–290 https://doi.org/10.1364/AOP.450345 AOPAC7 1943-8206
(2022).
Google Scholar
L. Cutrona et al.,
“Optical data processing and filtering systems,”
IRE Trans. Inf. Theory, 6
(3), 386
–400 https://doi.org/10.1109/TIT.1960.1057566 IRITAY 0018-9448
(1960).
Google Scholar
J. J. Hopfield,
“Neural networks and physical systems with emergent collective computational abilities,”
Proc. Natl. Acad. Sci. U. S. A., 79
(8), 2554
–2558 https://doi.org/10.1073/pnas.79.8.2554
(1982).
Google Scholar
D. Psaltis and N. Farhat,
“Optical information processing based on an associative-memory model of neural nets with thresholding and feedback,”
Opt. Lett., 10
(2), 98
–100 https://doi.org/10.1364/OL.10.000098 OPLEDP 0146-9592
(1985).
Google Scholar
N. H. Farhat et al.,
“Optical implementation of the Hopfield model,”
Appl. Opt., 24
(10), 1469
–1475 https://doi.org/10.1364/AO.24.001469 APOPAI 0003-6935
(1985).
Google Scholar
K. Wagner and D. Psaltis,
“Multilayer optical learning networks,”
Appl. Opt., 26
(23), 5061
–5076 https://doi.org/10.1364/AO.26.005061 APOPAI 0003-6935
(1987).
Google Scholar
D. Psaltis et al.,
“Holography in artificial neural networks,”
Nature, 343
(6256), 325
–330 https://doi.org/10.1038/343325a0
(1990).
Google Scholar
K. Vandoorne et al.,
“Parallel reservoir computing using optical amplifiers,”
IEEE Trans. Neural Networks, 22
(9), 1469
–1481 https://doi.org/10.1109/TNN.2011.2161771 ITNNEP 1045-9227
(2011).
Google Scholar
A. Silva et al.,
“Performing mathematical operations with metamaterials,”
Science, 343
(6167), 160
–163 https://doi.org/10.1126/science.1242818 SCIEAS 0036-8075
(2014).
Google Scholar
K. Vandoorne et al.,
“Experimental demonstration of reservoir computing on a silicon photonics chip,”
Nat. Commun., 5
(1), 3541 https://doi.org/10.1038/ncomms4541 NCAOBW 2041-1723
(2014).
Google Scholar
J. Carolan et al.,
“Universal linear optics,”
Science, 349
(6249), 711
–716 https://doi.org/10.1126/science.aab3642 SCIEAS 0036-8075
(2015).
Google Scholar
J. Chang et al.,
“Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,”
Sci. Rep., 8
(1), 12324 https://doi.org/10.1038/s41598-018-30619-y SRCEC3 2045-2322
(2018).
Google Scholar
N. M. Estakhri, B. Edwards and N. Engheta,
“Inverse-designed metastructures that solve equations,”
Science, 363
(6433), 1333
–1338 https://doi.org/10.1126/science.aaw2498 SCIEAS 0036-8075
(2019).
Google Scholar
J. Dong et al.,
“Optical reservoir computing using multiple light scattering for chaotic systems prediction,”
IEEE J. Sel. Top. Quantum Electron., 26
(1), 7701012 https://doi.org/10.1109/JSTQE.2019.2936281 IJSQEN 1077-260X
(2020).
Google Scholar
U. Teğin et al.,
“Scalable optical learning operator,”
Nat. Comput. Sci., 1
(8), 542
–549 https://doi.org/10.1038/s43588-021-00112-0 NASFEG 0258-1248
(2021).
Google Scholar
Y. Shen et al.,
“Deep learning with coherent nanophotonic circuits,”
Nat. Photonics, 11
(7), 441
–446 https://doi.org/10.1038/nphoton.2017.93 NPAHBY 1749-4885
(2017).
Google Scholar
A. N. Tait et al.,
“Neuromorphic photonic networks using silicon photonic weight banks,”
Sci. Rep., 7
(1), 7430 https://doi.org/10.1038/s41598-017-07754-z SRCEC3 2045-2322
(2017).
Google Scholar
X. Lin et al.,
“All-optical machine learning using diffractive deep neural networks,”
Science, 361
(6406), 1004
–1008 https://doi.org/10.1126/science.aat8084 SCIEAS 0036-8075
(2018).
Google Scholar
J. Bueno et al.,
“Reinforcement learning in a large-scale photonic recurrent neural network,”
Optica, 5
(6), 756
–760 https://doi.org/10.1364/OPTICA.5.000756
(2018).
Google Scholar
Y. Zuo et al.,
“All-optical neural network with nonlinear activation functions,”
Optica, 6
(9), 1132
–1137 https://doi.org/10.1364/OPTICA.6.001132
(2019).
Google Scholar
T. W. Hughes et al.,
“Wave physics as an analog recurrent neural network,”
Sci. Adv., 5
(12), eaay6946 https://doi.org/10.1126/sciadv.aay6946 STAMCV 1468-6996
(2019).
Google Scholar
J. Feldmann et al.,
“All-optical spiking neurosynaptic networks with self-learning capabilities,”
Nature, 569
(7755), 208
–214 https://doi.org/10.1038/s41586-019-1157-8
(2019).
Google Scholar
M. Miscuglio and V. J. Sorger,
“Photonic tensor cores for machine learning,”
Appl. Phys. Rev., 7
(3), 031404 https://doi.org/10.1063/5.0001942
(2020).
Google Scholar
H. Zhang et al.,
“An optical neural chip for implementing complex-valued neural network,”
Nat. Commun., 12
(1), 457 https://doi.org/10.1038/s41467-020-20719-7 NCAOBW 2041-1723
(2021).
Google Scholar
J. Feldmann et al.,
“Parallel convolutional processing using an integrated photonic tensor core,”
Nature, 589
(7840), 52
–58 https://doi.org/10.1038/s41586-020-03070-1
(2021).
Google Scholar
X. Xu et al.,
“11 TOPS photonic convolutional accelerator for optical neural networks,”
Nature, 589
(7840), 44
–51 https://doi.org/10.1038/s41586-020-03063-0
(2021).
Google Scholar
L. G. Wright et al.,
“Deep physical neural networks trained with backpropagation,”
Nature, 601
(7894), 549
–555 https://doi.org/10.1038/s41586-021-04223-6
(2022).
Google Scholar
F. Ashtiani, A. J. Geers and F. Aflatouni,
“An on-chip photonic deep neural network for image classification,”
Nature, 606
(7914), 501
–506 https://doi.org/10.1038/s41586-022-04714-0
(2022).
Google Scholar
D. Liu et al.,
“Training deep neural networks for the inverse design of nanophotonic structures,”
ACS Photonics, 5
(4), 1365
–1369 https://doi.org/10.1021/acsphotonics.7b01377
(2018).
Google Scholar
W. Ma, F. Cheng and Y. Liu,
“Deep-learning-enabled on-demand design of chiral metamaterials,”
ACS Nano, 12
(6), 6326
–6334 https://doi.org/10.1021/acsnano.8b03569 ANCAC3 1936-0851
(2018).
Google Scholar
J. Peurifoy et al.,
“Nanophotonic particle simulation and inverse design using artificial neural networks,”
Sci. Adv., 4
(6), eaar4206 https://doi.org/10.1126/sciadv.aar4206 STAMCV 1468-6996
(2018).
Google Scholar
I. Malkiel et al.,
““Plasmonic nanostructure design and characterization via deep learning,”
Light Sci. Appl., 7
(1), 60 https://doi.org/10.1038/s41377-018-0060-7
(2018).
Google Scholar
Z. Liu et al.,
“Generative model for the inverse design of metasurfaces,”
Nano Lett., 18
(10), 6570
–6576 https://doi.org/10.1021/acs.nanolett.8b03171 NALEFD 1530-6984
(2018).
Google Scholar
S. So and J. Rho,
“Designing nanophotonic structures using conditional deep convolutional generative adversarial networks,”
Nanophotonics, 8
(7), 1255
–1261 https://doi.org/10.1515/nanoph-2019-0117
(2019).
Google Scholar
W. Ma et al.,
“Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy,”
Adv. Mater., 31
(35), 1901111 https://doi.org/10.1002/adma.201901111 ADVMEW 0935-9648
(2019).
Google Scholar
S. An et al.,
“A deep learning approach for objective-driven all-dielectric metasurface design,”
ACS Photonics, 6
(12), 3196
–3207 https://doi.org/10.1021/acsphotonics.9b00966
(2019).
Google Scholar
J. Jiang et al.,
“Free-form diffractive metagrating design based on generative adversarial networks,”
ACS Nano, 13
(8), 8872
–8878 https://doi.org/10.1021/acsnano.9b02371 ANCAC3 1936-0851
(2019).
Google Scholar
C. Qian et al.,
“Deep-learning-enabled self-adaptive microwave cloak without human intervention,”
Nat. Photonics, 14
(6), 383
–390 https://doi.org/10.1038/s41566-020-0604-2 NPAHBY 1749-4885
(2020).
Google Scholar
Z. Liu et al.,
“Compounding meta-atoms into metamolecules with hybrid artificial intelligence techniques,”
Adv. Mater., 32
(6), 1904790 https://doi.org/10.1002/adma.201904790 ADVMEW 0935-9648
(2020).
Google Scholar
H. Ren et al.,
“Three-dimensional vectorial holography based on machine learning inverse design,”
Sci. Adv., 6
(16), eaaz4261 https://doi.org/10.1126/sciadv.aaz4261 STAMCV 1468-6996
(2020).
Google Scholar
C. Zuo and Q. Chen,
“Exploiting optical degrees of freedom for information multiplexing in diffractive neural networks,”
Light Sci. Appl., 11
(1), 208 https://doi.org/10.1038/s41377-022-00903-8
(2022).
Google Scholar
D. Mengu et al.,
“Analysis of diffractive optical neural networks and their integration with electronic neural networks,”
IEEE J. Sel. Top. Quantum Electron., 26
(1), 3700114 https://doi.org/10.1109/JSTQE.2019.2921376 IJSQEN 1077-260X
(2020).
Google Scholar
J. Li et al.,
“Class-specific differential detection in diffractive optical neural networks improves inference accuracy,”
Adv. Photonics, 1
(4), 046001 https://doi.org/10.1117/1.AP.1.4.046001
(2019).
Google Scholar
T. Yan et al.,
“Fourier-space diffractive deep neural network,”
Phys. Rev. Lett., 123
(2), 023901 https://doi.org/10.1103/PhysRevLett.123.023901 PRLTAO 0031-9007
(2019).
Google Scholar
D. Mengu, Y. Rivenson and A. Ozcan,
“Scale-, shift-, and rotation-invariant diffractive optical networks,”
ACS Photonics, 8
(1), 324
–334 https://doi.org/10.1021/acsphotonics.0c01583
(2020).
Google Scholar
D. Mengu et al.,
“Misalignment resilient diffractive optical networks,”
Nanophotonics, 9
(13), 4207
–4219 https://doi.org/10.1515/nanoph-2020-0291
(2020).
Google Scholar
M. S. S. Rahman et al.,
“Ensemble learning of diffractive optical networks,”
Light Sci. Appl., 10
(1), 14 https://doi.org/10.1038/s41377-020-00446-w
(2021).
Google Scholar
J. Li et al.,
“Spectrally encoded single-pixel machine vision using diffractive networks,”
Sci. Adv., 7
(13), eabd7690 https://doi.org/10.1126/sciadv.abd7690 STAMCV 1468-6996
(2021).
Google Scholar
O. Kulce et al.,
“All-optical information-processing capacity of diffractive surfaces,”
Light Sci. Appl., 10
(1), 25 https://doi.org/10.1038/s41377-020-00439-9
(2021).
Google Scholar
T. Zhou et al.,
“Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,”
Nat. Photonics, 15
(5), 367
–373 https://doi.org/10.1038/s41566-021-00796-w NPAHBY 1749-4885
(2021).
Google Scholar
H. Chen et al.,
“Diffractive deep neural networks at visible wavelengths,”
Engineering, 7
(10), 1483
–1491 https://doi.org/10.1016/j.eng.2020.07.032 ENGNA2 0013-7782
(2021).
Google Scholar
C. Liu et al.,
“A programmable diffractive deep neural network based on a digital-coding metasurface array,”
Nat. Electron., 5
(2), 113
–122 https://doi.org/10.1038/s41928-022-00719-9 NEREBX 0305-2257
(2022).
Google Scholar
D. Mengu et al.,
“Classification and reconstruction of spatially overlapping phase images using diffractive optical networks,”
Sci. Rep., 12
(1), 8446 https://doi.org/10.1038/s41598-022-12020-y SRCEC3 2045-2322
(2022).
Google Scholar
Y. Luo et al.,
“Computational imaging without a computer: seeing through random diffusers at the speed of light,”
eLight, 2
(1), 4 https://doi.org/10.1186/s43593-022-00012-4
(2022).
Google Scholar
D. Mengu et al.,
“Diffractive interconnects: all-optical permutation operation using diffractive networks,”
Nanophotonics, https://doi.org/10.1515/nanoph-2022-0358
(2022).
Google Scholar
D. Mengu and A. Ozcan,
“All-optical phase recovery: diffractive computing for quantitative phase imaging,”
Adv. Opt. Mater., 10
(15), 2200281 https://doi.org/10.1002/adom.202200281 2195-1071
(2022).
Google Scholar
B. Bai et al.,
“To image, or not to image: class-specific diffractive cameras with all-optical erasure of undesired objects,”
eLight, 2
(1), 14 https://doi.org/10.1186/s43593-022-00021-3
(2022).
Google Scholar
Ç. Işıl et al.,
“Super-resolution image display using diffractive decoders,”
Sci. Adv., 8
(48), eadd3433 https://doi.org/10.1126/sciadv.add3433 STAMCV 1468-6996
(2022).
Google Scholar
C. Qian et al.,
“Performing optical logic operations by a diffractive neural network,”
Light Sci. Appl., 9
(1), 59 https://doi.org/10.1038/s41377-020-0303-2
(2020).
Google Scholar
P. Wang et al.,
“Orbital angular momentum mode logical operation using optical diffractive neural network,”
Photonics Res., 9
(10), 2116
–2124 https://doi.org/10.1364/PRJ.432919
(2021).
Google Scholar
Y. Luo, D. Mengu and A. Ozcan,
“Cascadable all-optical NAND gates using diffractive networks,”
Sci. Rep., 12
(1), 7121 https://doi.org/10.1038/s41598-022-11331-4 SRCEC3 2045-2322
(2022).
Google Scholar
Y. Luo et al.,
“Design of task-specific optical systems using broadband diffractive neural networks,”
Light Sci. Appl., 8
(1), 112 https://doi.org/10.1038/s41377-019-0223-1
(2019).
Google Scholar
M. Veli et al.,
“Terahertz pulse shaping using diffractive surfaces,”
Nat. Commun., 12
(1), 37 https://doi.org/10.1038/s41467-020-20268-z NCAOBW 2041-1723
(2021).
Google Scholar
Z. Huang et al.,
“All-optical signal processing of vortex beams with diffractive deep neural networks,”
Phys. Rev. Appl., 15
(1), 014037 https://doi.org/10.1103/PhysRevApplied.15.014037 PRAHB2 2331-7019
(2021).
Google Scholar
O. Kulce et al.,
“All-optical synthesis of an arbitrary linear transformation using diffractive surfaces,”
Light Sci. Appl., 10
(1), 196 https://doi.org/10.1038/s41377-021-00623-5
(2021).
Google Scholar
J. Li et al.,
“Polarization multiplexed diffractive computing: all-optical implementation of a group of linear transformations through a polarization-encoded diffractive network,”
Light Sci. Appl., 11
(1), 153 https://doi.org/10.1038/s41377-022-00849-x
(2022).
Google Scholar
T. Ishihara et al.,
“An optical neural network architecture based on highly parallelized WDM-multiplier-accumulator,”
15
–21
(2019). https://doi.org/10.1109/PHOTONICS49561.2019.00008 Google Scholar
R. Hamerly et al.,
“Edge computing with optical neural networks via WDM weight broadcasting,”
Proc. SPIE, 11804 118041R https://doi.org/10.1117/12.2594886 PSISDG 0277-786X
(2021).
Google Scholar
A. Totovic et al.,
“Programmable photonic neural networks combining WDM with coherent linear optics,”
Sci. Rep., 12
(1), 5605 https://doi.org/10.1038/s41598-022-09370-y SRCEC3 2045-2322
(2022).
Google Scholar
“TSL-570|SANTEC CORPORATION: The photonics pioneer,”
https://www.santec.com/en/products/instruments/tunablelaser/TSL-570/ Google Scholar
“MEMS-VCSEL swept-wavelength laser sources,”
https://www.thorlabs.com/newgrouppage9.cfm?objectgroup_id=12057
().
Google Scholar
D. P. Kingma and J. Ba,
“Adam: a method for stochastic optimization,”
(2014).
Google Scholar
I. Loshchilov and F. Hutter,
“Decoupled weight decay regularization,”
in Int. Conf. Learn. Represent.,
18
(2019). Google Scholar
BiographyJingxi Li received his BS degree in optoelectronic information science and engineering from Zhejiang University, Hangzhou, Zhejiang, China, in 2018. Currently, he is working toward his PhD in the Electrical and Computer Engineering Department, University of California, Los Angeles, California, United States. His work focuses on optical computing and information processing using diffractive networks and computational optical imaging for biomedical applications. Tianyi Gan received his BS degree in physics from Peking University, Beijing, China, in 2021. He is currently a PhD student in the Electrical and Computer Engineering Department at the University of California, Los Angeles. His research interests are terahertz source and imaging. Bijie Bai received her BS degree in measurement, control technology, and instrumentation from Tsinghua University, Beijing, China, in 2018. She is currently working toward her PhD in the Electrical and Computer Engineering Department, University of California, Los Angeles, CA, USA. Her research focuses on computational imaging for biomedical applications and machine learning and optics. Yi Luo received his BS degree in measurement, control technology, and instrumentation from Tsinghua University, Beijing, China, in 2016. He is currently working toward his PhD in the Bioengineering Department, University of California, Los Angeles, CA, USA. His work focuses on the development of computational imaging and sensing platforms. Mona Jarrahi is a professor and a Northrop Grumman Endowed chair in the Electrical and Computer Engineering Department at the University of California Los Angeles and the director of Terahertz Electronics Laboratory. She has made significant contributions to the development of ultrafast electronic and optoelectronic devices and integrated systems for terahertz, infrared, and millimeter-wave sensing, imaging, computing, and communication systems by utilizing innovative materials, nanostructures, and quantum structures as well as innovative plasmonic and optical concepts. Aydogan Ozcan is the Chancellor’s professor and the Volgenau chair for engineering innovation at UCLA and an HHMI professor at the Howard Hughes Medical Institute. He is also the associate director of the California NanoSystems Institute. He is elected a fellow of the National Academy of Inventors and holds >60 issued/granted patents in microscopy, holography, computational imaging, sensing, mobile diagnostics, nonlinear optics, and fiber-optics. He is also the author of 1 book and the co-author of >950 peer-reviewed publications in leading scientific journals/conferences. He is elected a fellow of Optica, AAAS, SPIE, IEEE, AIMBE, RSC, APS and the Guggenheim Foundation and is a lifetime fellow member of Optica, NAI, AAAS, and SPIE. |