Multimode diffractive optical neural network

Abstract. On-chip diffractive optical neural networks (DONNs) bring the advantages of parallel processing and low energy consumption. However, an accurate representation of the optical field’s evolution in the structure cannot be provided using the previous diffraction-based analysis method. Moreover, the loss caused by the open boundaries poses challenges to applications. A multimode DONN architecture based on a more precise eigenmode analysis method is proposed. We have constructed a universal library of input, output, and metaline structures utilizing this method, and realized a multimode DONN composed of the structures from the library. On the designed multimode DONNs with only one layer of the metaline, the classification task of an Iris plants dataset is verified with an accuracy of 90% on the blind test dataset, and the performance of the one-bit binary adder task is also validated. Compared to the previous architectures, the multimode DONN exhibits a more compact design and higher energy efficiency.


Introduction
Current electronic computing devices are faced with the challenges of limited bandwidth, high power consumption, and high cost. 1 These challenges promote the research enthusiasm of optical neural networks (ONNs). 2,3This is attributed to the high bandwidth and high parallelism characteristics of light, which are manifested in the ONNs composed of Mach-Zehnder interferometers (MZIs), [4][5][6] micro-rings resonators (MRRs), [7][8][9] scattering 10 and diffraction [11][12][13][14][15] structures.It is worth noting that on-chip ONN is more competitive on portability and footprint, and even some commercial companies have been established. 166][27][28] Notwithstanding these advancements, certain challenges have emerged.Specifically, to ensure stable interference in the DONN, a relatively large spacing between metalines and open boundaries is required, leading to severe light leakage and a substantial footprint.In addition, the previous diffraction analysis method (DAM) exhibits a decrease in accuracy as the number of metalines increases.Meanwhile, DAM is insufficient for analyzing the evolution and loss of the optical field in the output ports.
In this paper, we propose a multimode DONN structure, in which eigenmodes are utilized as neurons.In multimode DONN, the metaline formed by Si etching slots manipulates the coupling between eigenmodes.This coupling mechanism physically realizes the connection of neurons.The corresponding eigenmode analysis method (EAM) is used to analyze the evolution of the optical field in multimode DONN, which has higher accuracy and faster calculation speed.Based on this method, a universal library including the metalines, the input and output structures are constructed.The assembled multimode DONNs complete the classification tasks of the Iris dataset and one-bit binary adder through optimization.With a smaller footprint and higher energy transfer efficiency, the multimode DONN has the potential to provide higher computing power for the next generation of artificial intelligence (AI) platforms.

Structure and Principle
Figure 1(a) shows the architecture of the multimode DONN.The input, output, and metaline structures are connected by the multimode waveguide, where the metaline consists of an arrangement of subwavelength units with a lateral period of 0.5 μm including the Si etching slots region and non-etching area, as shown in Fig. 1(b).The length and width of the Si etching slot are 1.1 and 0.2 μm, respectively.The input structure utilizes the width of the multimode waveguide, and several input waveguides are arranged appropriately to realize the input of the optical field modulated with information, as shown in Fig. 1(c).The optical field is guided by the multimode waveguide, then modulated by the metaline, and finally reaches the output structure.Two types of output structures are designed by multiplexing space or modes.One is a space-only multiplexing structure, where multiple inverse tapers are connected at the end of the multimode waveguide to become output waveguides, as shown in Fig. 1(e).The other is a structure that multiplexes both space and modes, such as Fig. 1(f).The inverse tapers are connected first, and then the asymmetric directional coupler is connected to construct a two-mode demultiplexer, which can guide the TE 0 and TE 1 modes in the bus waveguide to different output ports.The output of multimode DONN is obtained by sampling the optical power with photodetector (PD) at the output port.
By analyzing the transmission and coupling of the eigenmodes, the evolution of the optical field in multimode DONN can be obtained.Therefore, the EAM is used to design and analyze multimode DONN, and the eigenmodes in multimode DONN are utilized as neurons, as will be demonstrated in the following.The multimode waveguide in Fig. 1(a) with width and thickness of 6 and 0.22 μm, respectively, is taken as an example.There are a limited number (N ¼ 19) of eigenmodes in the lateral direction.Any optical field E that can propagate stably in the waveguide can be expanded into a superposition of eigenmode optical fields E n , 29 and the coefficient of the superposition is a n : (1) where E n and H n represent the electric and magnetic fields of the preset n'th eigenmode, respectively.The evolution of the optical field in the multimode waveguide is the result of multimode interference.Different eigenmodes propagate independently with their propagation constant β eff ½n, and the total power is P ¼ P ja n j 2 P n , where P n is the power of the preset n'th eigenmode.The part of the optical field that cannot propagate stably in the multimode waveguide will appear outside Eq. ( 1) in the form of residuals, which will be dissipated in propagation, so the actual optical field approaches E as it propagates.
Since the optical field response in linear materials satisfies the superposition principle, as long as obtaining the response E response ½n of the metaline stimulated by n'th eigenmode, the response E response of the metaline to any input E can be obtained by summing E response ½n weighted a n in Eq. (2).Moreover, E response ½n can also be formed by weighting the eigenmodes just like Eq. (1), where the weight of m'th eigenmode is w nm .In other words, the metaline makes a coupling connection with a fixed weight w nm between the n'th eigenmode at the input and the m'th eigenmode at the output.This connection can be fully expressed by the matrix w NN of N × N dimensions.Once w NN is obtained, E response can be calculated: It should be noted that in Eq. ( 3), E m is only related to the multimode waveguide, and w nm is only related to the metaline.The a n fully expresses the input.
The multimode DONN serves as a mode converter, 30 as shown in Figs.1(c)-1(f).The optical fields in the three input single-mode waveguides are phase-or amplitude-modulated and injected into the multimode waveguide, and the responses are decomposed into 19 eigenmodes.This process realizes the dimensionality of the input data to multiple eigenmodes.The 19 eigenmodes with information are independently propagated forward with their respective propagation coefficients.Subsequently, the Si etching slots in the metaline perturb the phase distribution of the optical field, thereby influencing the distribution of 19 eigenmodes and achieving mutual coupling among them.The output structure allows 19 eigenmodes to be coupled to the output waveguides.However, not all eigenmodes can couple losslessly to the output mode, otherwise, it would violate the reciprocity theorem. 31The mode coupling matrix of the output structure determines the proportion of each eigenmode that contributes to the output, with the remaining portion dissipating as a loss.
Through such multimode coupling, the complex connection of the neural network is realized physically.It should be noted that eigenmodes in the multimode DONN are equivalent to neurons, instead of the slot groups 11 in the previous DONN, as discussed in Sec.4.1.

Result
Based on the EAM proposed above, a universal library consisting of the metaline, the input, and the output structures is established.The assembled multimode DONN is designed to complete the verification tasks, which include the classification task of the Iris plants dataset and one-bit binary adder.

Build Library: Metalines, Input, and Output Structures
The multimode waveguide with a wideness of 6 μm and thickness of 0.22 μm is still used as the basic structure.As shown in Fig. 1(b), on the lateral side of this multimode waveguide, there are 12 optional locations for placing the Si etching slot with a lateral period of 0.5 μm.Each slot can be placed or removed, resulting in a total of 2 12 ¼ 4096 various metalines.There is a total of 19 eigenmodes in the lateral direction.The response E response ½n of each metaline excited by each input eigenmode is obtained by var-FDTD simulation.Subsequently, the response is used to calculate the mode coupling matrix w NN according to Eq. ( 2).This matrix is recorded in the library and associated with the identification number of the metaline.The metalines in the library can be called at will to take the corresponding matrices to participate in the design and calculation.For the visualization of the mode coupling matrices, please refer to Appendix B. The mode coupling matrices of the input and output structures proposed in Sec. 2 are similarly obtained.For the input structure as shown in Fig. 1(c), different input ports (IN) are injected with optical fields respectively, and the responses are obtained for calculating w in N×IN according to Eq. ( 2).It should be noted that the dimension of this matrix is the number of eigenmodes (N) multiplied by the number of IN.In the output structures as shown in Figs.1(e) and 1(f), the response on each port after each eigenmode excitation is obtained, and then the mode coupling matrix w out OUT×N is obtained.This is a matrix with the number of output ports (OUT) multiplied by the eigenmode number (N).If higher-order eigenmodes are considered on the output waveguide, an additional dimension, i.e., the number of eigenmodes, is required.The constructed input and output structures realize the dimensionality increase and decrease of data, and the metalines implement the complex connection.
As shown in Fig. 2, when the task is defined, the input and output structures that fit the data dimension are picked out from the library, and they are combined with the metalines in the library to become the potential multimode DONN structures.The port-to-port transmission matrices of these potential structures can be quickly obtained by multiplying the mode coupling matrices of the separate parts, which avoids time-consuming electromagnetic simulations while maintaining high accuracy, as will be verified in Sec.4.1.After that, the training dataset is loaded into the port-to-port transmission matrices of these potential DONN structures, and the output results will be evaluated, which may be prediction accuracy, or the desired logical result, etc.A data augmentation approach 32 can be employed.Additional noise added to the training dataset 14 can enhance the robustness of the multimode DONN.The best of these structures will be selected as the final multimode DONN design.In the next section, the photonic computing tasks will be validated.

Iris Classification
To complete the Iris classification task, the input structure with four ports satisfying the input data dimension and the output structure with three ports satisfying the classification categories are first selected from the library.Cooperating with the metalines in the library, the assembled multimode DONN is used to complete the task, as shown in Fig. 3(a) (more details in Appendix A).Three kinds of Iris are classified according to the length and width of the calyx and petals.These data are normalized and mapped to 0 − π, which are phase modulated to the fundamental mode field of the input waveguides.This optical field is fed into the multimode DONN and passes through the metaline.The category corresponding to the output port receiving the highest power is judged as a classification result.
To train the multimode DONN for the Iris classification task, the training methodology in Sec.3.1 is employed.The training dataset is loaded into the port-to-port linear transformation matrices of the potential multimode DONN, which is obtained by multiplying the mode coupling matrices of the separate parts in the library.The intensity of each output port is calculated, and the accuracy of the classification results of the different potential multimode DONNs is recorded.The metaline numbered as 1438 in the library has the highest accuracy and is selected as the preferred structure.The test dataset is identically loaded into the multimode DONN with the selected metaline, and the accuracy of the blind test dataset is 90%.The confusion matrix of the test dataset is shown in Fig. 3(c), and the fundamental mode amplitudes of the three output ports for the test dataset are shown in Fig. 3(d).Var-FDTD has conducted simulation verification of the device, as shown in Fig. 3(b), which has a correct classification result and shows the same accuracy rate on the Iris test dataset.Figure 3(d) also shows the power of output.Compared with the previous works, 11,25,28 the energy efficiency has been significantly improved.It means a higher tolerance for detection noise.The computing part of the whole device occupies about 6 μm × 15 μm, which has the characteristics of high integration.

One-Bit Binary Adder
Similarly based on the library, a three-input structure that multiplexes space, a metaline, and a four-output structure that multiplexes space and mode, are assembled to complete a one-bit binary adder, as shown in Fig. 4(a) (more details in Appendix A).The four input cases in the truth table and a constant reference bias are modulated to the phase of the input optical field, as shown in Table 1.0 (1) corresponds to 0 (π), and the reference bias phase continues to be 1, i.e., π.The power of the symmetrical upper and lower ports is detected and compared.If the power of the upper (lower) port is higher, the output is 1 (0).Similar to the training methodology in Sec.3.1, the metaline number 347 is selected because of the higher contrast between the upper and lower ports.2 Training process and application demonstration of the multimode DONN composed of the structures in the library.When the task is defined, the training data are loaded into a variety of the potential multimode DONN structures composed of the input, output, and metalines in the library, as shown by the dotted lines.The performance of each potential multimode DONN is evaluated using the port-to-port transmission matrix and the best one is selected.Live or test data will be loaded in.

Comparison of the Multimode DONN with the Previous DONNs
There are three structural differences between the previous DONNs based on DAM and the multimode DONNs.In the previous DONNs, first, metalines are arranged in a Si slab with lateral-open borders, as shown in Fig. 5(a).The lack of borders is to reduce reflection, but the energy leaking out is not utilized.Second, multiple 14,25 (≥2, normally) identical Si etching slots in metaline need to be clustered together to form a quasi-periodic medium structure group, which occupies multiple lateral periods of the slot, so that the previous DONNs become very wide.Third, the spacing between adjacent metalines needs to be much greater than the lateral period of the slot, otherwise far-field stable interference cannot be formed and the phase shift created by the structure groups, as shown in Fig. 5(b), will also change due to excessive inclination angle, 14,25 which will seriously affect the accuracy of the DAM.All of the above problems arise to meet the preconditions of DAM, which significantly limits the integration capability and the optical energy efficiency of the previous DONNs.As long as the multimode DONN no longer relies on the DAM, these problems can be avoided.
To demonstrate the advantages of EAM over DAM in terms of accuracy and computational overhead, the following structures are designed.Identical metalines are cascaded and deployed respectively in the same position of the lateral-open Si slab and the multimode waveguide with a transverse width of 20 μm, such as that shown in Figs.6(a) and 6(c).The spacing from the input facet to the first metaline and from the first metaline to the second is 40 μm, which fits the length limit of the DAM as much as possible (more details in Appendix A).Ten groups of Si etching slots are deployed in each metaline, and the length of the groups is randomly set at 0 to 2.2 μm so that the phase modulation of each group can cover the entire 2π, as shown in Fig. 5(b).The input waveguides are loaded with optical fields with random amplitude and phase.The discrepancy between the normalized optical fields Ê and E, obtained by DAM (EAM) and var-FDTD simulation, respectively, is measured by root mean square error (RMSE): where P ¼ 1122 is the number of sampling points.As shown by the blue line in Fig. 6(b), the downward trend represents a gradual improvement of the accuracy in DAM as the propagation length increases.This confirms the problem that the spacing between the adjacent metalines cannot be too short.
The RMSE rises sharply after each metaline, and the RMSE at the end has reached 0.124, which is 4.6 times that of the input (0.027).The difference between the optical field calculated by the DAM and var-FDTD is obvious, as shown in Figs.6(e) and 6(f).However, the RMSE of the EAM grows slowly, as shown by the green line in Fig. 6(b).Therefore, the spacing does not significantly affect the accuracy of EAM.After passing through the metalines, the RMSE does not rise evidently, and RMSE ¼ 0.049 at the output is 1.63 times that of RMSE ¼ 0.03 at the input.As shown in Figs.6(g)-6(i), in front of the metalines or at the end, compared with the DAM, the field obtained by the EAM has a higher fitting accuracy with var-FDTD.In the process of constructing Fig. 6(b), the DAM takes 7400 s, which is about 104 times longer than the EAM takes 71 s.This demonstrates that the EAM has less computational overhead.A personal desktop computer was utilized for simulation and computation.
The characteristic of multimode DONN to save optical energy is also reflected.The ratio of the optical power obtained at the end cross-section of the structure is defined as the energy transfer efficiency (T).The structure with the multimode waveguide reflects the light that leaks from the open boundary in the previous structure, thereby increasing the energy transfer efficiency from T ¼ 0.68 in previous structure to T ¼ 0.95.The remaining loss comes from the dissipation in the transmission process.As the number of metalines increases, the difference in T becomes more obvious.Higher transmission efficiency means smaller input energy requirement and lower detection sensitivity, which is beneficial to reducing computing power consumption.
Table 1 The truth table of a one-bit binary adder.By comparison, EAM demonstrates higher analysis and design accuracy with less computational overhead.This significant advantage over DAM promotes the design of multimode DONN, achieving both high precision and speed.The formation of the multimode DONNs' boundaries is attributed to the eigenmodes that serve as the foundation for analysis.This enhances energy transfer efficiency while allowing for a further reduction in the footprint of the multimode DONN to increase integration density.The following section will provide evidence for this.

Footprint and Optical Loss
It is beneficial to reduce the footprint by making full use of the multimode.In the ONN implemented by the MZI network, [4][5][6] a single waveguide has only one eigenmode, and the coupling between adjacent eigenmodes is accomplished by a directional coupler. 17The distance between the arms is generally maintained at more than a few microns to ensure that no crosstalk occurs.Potential multimode in this space is not utilized as much as possible.To realize the coupling between each single mode, a multilayer directional coupler array 4,5,18,19 is necessary; however, this coupling connection between the modes can be completed by a metaline.The previous DONN is designed based on the DAM.Since the single neuron must be mapped by the Si etching slot group, the lateral period of the neurons is about 1.5 μm, 11 while the lateral density in multimode DONN is about 315 nm per neuron (mode), and the EAM ensures that the multimode DONN no longer requires large metaline spacing.As a result, the footprint of the multimode DONN in this work is at least nine times smaller than that of previous identical or similar tasks (classification or convolution).The comparison results are shown in Table 2.
The optical loss introduced by the multimode DONN as a passive device is considered.Taking the total optical energy injected into the device as the reference value, the maximum optical power at the output port is utilized to calculate the typical optical loss.However, the previous works 11,27,28,33 overlooked the output structures, allowing for comparison solely based on the optical power at the output cross-section, and the typical losses based on var-FDTD simulation are presented in the last column of Table 2.In Fig. 3(b), the typical loss for the multimode DONN in performing the Iris classification task is below 7.69 dB, which is lower compared to the previous works.When considering the loss caused by the output structures of other works, this difference becomes even more pronounced.Hence, the multimode DONN exhibits energy-saving characteristics.

Metalines and I/O Structure
The metaline is the core structure for computation in the multimode DONN, selected from the library by EAM.In this work, the task-agnostic library comprises 4096 metalines constructed by exploring the presence or absence of etching slots.As the etching slots in metaline share the same design, the consistent errors manifested during fabrication can be incorporated into the structure during library construction.This helps mitigate the impact of fabrication errors.The computation of the mode coupling matrices for the metalines in the library was completed using three server-grade computers over about 60 h.When two metalines are symmetric, their mode coupling matrices have the following relationship: This allows for the calculation of some metalines to be omitted, aiming to save time.Phase-change materials 34,35 can fill the Si etching slots, and its two steady states of the refractive index correspond to the presence or absence of the etching slots.Cascading metalines contribute to improving the computational performance of the multimode DONN since it enhances the diversity of the mode coupling matrices.For instance, the accuracy of the Iris classification task in the multimode DONN cascading two metalines can be further improved to 93.3% (more details can be found in Appendix A).
The structures that multiplex space or modes are adopted as input-output configurations in this work.Many new compact and stable mode or meta-structure device 36,37 can be included in the library when their mode coupling matrices are obtained.In this work, data are loaded into the multimode DONN by phase modulation, as metalines manipulate the phase of the optical field, and the input power is stable.In addition, phase modulators are simple and mature.

Scalability of the Multimode DONNs
With the aim of enhancing the data processing capability of the multimode DONN, the following approaches can be considered.First, higher-order eigenmodes in the multimode DONN can be multiplexed to expand input-output capacity, requiring additional higher-order mode multiplexers and demultiplexers. 38urthermore, by deploying multiple multimode DONNs in a distributed and layered manner, 39 the data processing capacity of the multimode DONN can be further increased, allowing optical fields to interact across multiple multimode DONNs.
Library-based EAM can be combined with other differential optimization methods.The multimode DONN designed based on the library can serve as a seed structure for optimization using particle swarm optimization 27 or the adjoint field method. 40n addition, utilizing EAM can bypass the positions where structures are not allowed to be deployed and input-output structures, enhancing the calculation speed of the optical field.
Built upon the foundation of multimode waveguides, the multimode DONN is compatible with integration into multimode systems, 41 achieving an integrated solution for transmission and processing.Optoelectronic hybrid networks have emerged as a new application paradigm. 15The multimode DONN can perform feature extraction and processing of data, while electronic neural networks carry out further computations on the data.The electronic neural network enhances the flexibility of the hybrid network 28 and can correct system errors 42 to adapt to more complex tasks.

Conclusion
In this paper, we introduce a compact multimode DONN structure, where eigenmodes are employed as neurons.Simultaneously, leveraging the proposed EAM, a universal library of structures, including metalines, input, and output structures, is established.Each structure is characterized by a mode coupling matrix.Through optimization, the most suitable structures are selected to compose the multimode DONN for validation tasks, including the Iris classification and one-bit binary adder.For similar or identical tasks, the multimode The average amplitude of the mode coupling matrix elements for each group of metalines is presented in the top images.The bottom images depict the variance.With an increasing number of etching slots, the mode coupling matrices gradually diverge from the identity matrix, and the variance initially increases and then decreases.

Fig. 1
Fig. 1 Multimode DONN and EAM.(a) Multimode DONN.As an example, the width of the multimode waveguide is 6 μm.There are 19 eigenmodes in the lateral direction.(b) The details of the metaline.The length and width of the Si etching slot are 1.1 and 0.2 μm, respectively.There are 12 positions for the Si etching slot to be placed with a lateral period of 0.5 μm, which is filled with silica.As shown in (c)-(f), the coupling between the eigenmodes physically enables the network connection.(c) Input structure.Each input fundamental mode field excites the response separately, which is decomposed into the 19 eigenmodes.(d) The 19 eigenmodes propagate independently, and the 19 responses are excited after passing through the metaline.The responses are decomposed into the 19 eigenmodes again.(e) The output structure of the multiplexing space.The 19 eigenmodes excite the 19 responses, then part of the energy in the responses is coupled to the three output waveguides, and the rest leaks out.(f) The output structure of the joint multiplexing space and mode with a total of four output ports.

Figure 4 (
b) shows var-FDTD simulation results for four input cases.Moreover, the power of each output port is shown in Fig.4(c) on the right.

Fig.
Fig.2Training process and application demonstration of the multimode DONN composed of the structures in the library.When the task is defined, the training data are loaded into a variety of the potential multimode DONN structures composed of the input, output, and metalines in the library, as shown by the dotted lines.The performance of each potential multimode DONN is evaluated using the port-to-port transmission matrix and the best one is selected.Live or test data will be loaded in.

Fig. 3
Fig. 3 The classification task of the Iris plants dataset.(a) Multimode DONN structure.The category corresponding to the output port receiving the highest power is judged as a classification result.PD, photodetector.(b) A set of Setosa class data is simulated by var-FDTD.(c) The confusion matrix of the test dataset.(d) Fundamental mode amplitudes for the three output ports of the test dataset.The gray and yellow bars mark the dataset presented in (b) and the three misclassified datasets, respectively.

Fig. 4
Fig. 4 One-bit binary adder.(a) Multimode DONN and discriminant structure.(b) Var-FDTD simulation of four input cases.(c) The power of the four output ports normalized to the input port power.Ports 1 to 4 indicate the marked ports, as shown by the dashed gray lines.

Fig. 5
Fig. 5 Previous DONN layout.(a) Every three identical Si etching slots form a group in the metaline, which is laid in a lateral open Si slab.w pq represents the diffractive connection between the points p and q, which are placed in the adjacent metalines.(b) Phase shift or transmittance versus the length of the group, except for the length of the group, and the parameters of the Si etching slot are consistent with Fig. 1(b) (more details in Appendix A).

Fig. 6
Fig. 6 The optical fields calculated by the DAM and EAM are compared.(a) The amplitude of the optical field in the lateral open device obtained by var-FDTD.(b) RMSE of DAM or EAM varies with the propagation distance.The gray narrow strip areas are the metalines.(c) The amplitude of the optical field in the multimode device is obtained by var-FDTD simulation.(d)-(i) Comparison of the optical fields calculated by the DAM (EAM) or var-FDTD in front of the first and second metalines, and at the end.

Table 2
Comparison with previous works.