Spectral transfer-learning-based metasurface design assisted by complex-valued deep neural network

Abstract. Recently, deep learning has been used to establish the nonlinear and nonintuitive mapping between physical structures and electromagnetic responses of meta-atoms for higher computational efficiency. However, to obtain sufficiently accurate predictions, the conventional deep-learning-based method consumes excessive time to collect the data set, thus hindering its wide application in this interdisciplinary field. We introduce a spectral transfer-learning-based metasurface design method to achieve excellent performance on a small data set with only 1000 samples in the target waveband by utilizing open-source data from another spectral range. We demonstrate three transfer strategies and experimentally quantify their performance, among which the “frozen-none” robustly improves the prediction accuracy by ∼26% compared to direct learning. We propose to use a complex-valued deep neural network during the training process to further improve the spectral predicting precision by ∼30% compared to its real-valued counterparts. We design several typical teraherz metadevices by employing a hybrid inverse model consolidating this trained target network and a global optimization algorithm. The simulated results successfully validate the capability of our approach. Our work provides a universal methodology for efficient and accurate metasurface design in arbitrary wavebands, which will pave the way toward the automated and mass production of metasurfaces.


Introduction
Compact 2D arrays consisting of subwavelength artificial metaatoms, metasurfaces 1 exhibit design flexibility for controlling electromagnetic (EM) waves, which has aroused widespread enthusiasm among researchers for multiple wavebands in optics and photonics.][16] The common methodology of metasurface design is a one-time and trial-and-error process relying much on physical insights and experience, and the practical pattern of metasurface comes from the manually iterative numerical calculations, such as the finite-element method and the finite-difference time-domain method, until the desired performance criteria are reached.
The search procedure for geometrical parameters of meta-atoms is not only time-consuming and inefficient but also easy to miss the global optimal solution, which is more defective as the dimensions of the variables or objective functions increase.To overcome these limitations, deep learning as a powerful computational tool has been employed in metasurface design, [17][18][19][20][21][22] which is capable of revealing the underlying nonlinear and nonintuitive relations between the geometrical parameters and EM responses of the meta-atoms in a real-time and automatic way. 23,24][27][28][29][30][31] However, the current deep-learning-based methods face the contradiction between efficiency and accuracy because they usually resort to a specific huge data set for reliable performance owing to their data-hungry nature.Collecting adequate data is slow and expensive, while training a DNN from scratch also involves considerable time consumption and computing resources.Once the feature space or distribution is changed, even though slightly different from the source task, the trained model may work ineffectively.To make the deep-learning-based methods more versatile for metasurface design, some solutions have been proposed to improve the model performance with a small data set.Transfer learning is a helpful framework that shares the knowledge and experience learned from the source model with the target one, so reduces the amount of required data and implements the prediction rapidly with high accuracy.In nanophotonics and metasurface design, transfer-learning-based methods are used to migrate knowledge between different physical scenarios, such as photonics films with different numbers of layers, 32 and dielectric meta-atoms with different shapes or dimensions of cross sections. 33,34Also the knowledge from other research fields can also be leveraged to study metasurfaces through transfer learning.For example, the meta-atoms are treated as images using the transfer-learning model based on GoogLeNet-Inception-V3 and realize the classification of phase from 0 to 360 deg. 35In addition to the transfer-learning-based method, data augmentation 36 and spectral scalability 37 were explored to reduce the dependence on data size, whereas previous works were limited to a fixed spectral range of the labeled data set.Once the range of the working waveband or the number of the sampling points changes, it is necessary to train a new model, and the training data set should be reprepared for this new task.Especially when material dispersion in different bands is considered, the scale invariance of Maxwell's equations is no longer applicable.The method that exploits the spectral scalability by wavelength normalization trying to address this problem still suffers from a sudden deviation from the simulation data near the boundaries of the waveband range, which means the EM responses of the meta-atoms are not perfectly scalable.
In this work, we introduce a metasurface design methodology empowered by transfer learning that utilizes the commonality of the EM characteristics of the dielectric meta-atoms in different spectral ranges, thereby reducing the number of data by bridging the disparity of working frequencies.Specifically, we train the base model on an open-source data set 25 in the infrared (IR) band from 30 to 60 THz and then store the knowledge gained in solving the source task, and transfer it to our terahertz (THz) spectral range from 0.5 to 1.0 THz to help the target task, which is trained on our small homemade data set.Here a complex-valued fully connected network that achieves highperformance spectral predicting ability is used in the source model, with an improvement of ∼30% compared to the sum of its real-valued counterparts.We demonstrate three transfer strategies and experimentally quantify their performance, among which the "frozen-none" improves the prediction accuracy by ∼26% compared to direct learning.Further, we propose several typical THz metadevices, including a metalens and a vortex beam generator, by employing the hybrid inverse model consolidating this trained target network and a global optimization algorithm as a proof-of-concept application.The simulated results clarify the reliability and scalability of our spectral transfer-learning-based metasurface design methodology assisted by complex DNN (CDNN), which is of great significance for balancing the efficiency and accuracy of the deep-learning-based method, hence promoting metasurface studies in arbitrary wavebands.

Overall Framework of the Metasurface Design Methodology
Figure 1 schematically illustrates the spectral transfer-learningbased metasurface comprehensive design framework, which consists of three primary submodules: (i) a deep-learning-based forward prediction source model trained on the massive opensource labeled data, (ii) a target model benefiting from the knowledge transferred from the source model then fine-tuned with a small homemade data set, and (iii) a hybrid inverse model for the on-demand metasurface design implemented by combining the trained target model with the conditioned adaptive particle swarm optimization (CAPSO) algorithm.The required transmission spectrum is fed into the inverse model as the design goal to retrieve the geometrical dimensions of the candidate meta-atoms, which are evaluated by comparing the predicted EM responses output from the well-trained target model and the desired goal.Then the optimization algorithm iteratively updates the generated dimensions until the maximum epoch or convergence criterion is reached.The efficacy of our proposed methodology is ultimately demonstrated by the performance of the metasurface assembled from the optimal meta-atoms at each position.

Source Spectral Model Construction
The source model is a data-driven feed-forward neural network, consisting of 11 fully connected layers that is aimed at dealing with the regression problem between the structure parameters and the EM responses of the meta-atoms.The fully connected network is capable of unveiling this implicit nonlinear relationship in a simplified and stable form, especially when the variables can be parameterized as tensors.Here the input and output parameters of the network as well as all the hyperparameters (weights W and bias b) of each layer are extended to the complex domain to directly predict the complex transmission coefficients of meta-atoms, from which the phase and amplitude can be derived monolithically.In addition, the corresponding network functions, such as normalization, loss, nonlinear activation, and regularization functions, should also be adjusted to the form suitable for complex numbers.More detailed information about the CDNN has been included and discussed in Sec. 1 of the Supplementary Material.Using complex parameters has numerous advantages, including a richer and more versatile representation capacity and a more robust memory-retrieval mechanism, which has been demonstrated to improve the performance of the computer vision and audio-signal-processing tasks based on the CDNN compared to their real-valued counterparts. 38e architecture of CDNN is depicted in Fig. 2 tensor D of size 4 including permittivity ε, radius r, height h, and the gap g between adjacent meta-atoms, and the output tensor T is the complex transmission coefficient sampled over 30 to 60 THz with an interval of 1 THz, giving a total of 31 elements.The supervised training process is conducted by minimizing the mean squared error (MSE) between the prediction result generated from the network and the ground truth given by full-wave simulations, where m is the number of the spectrum points, and ˆT and T are the predicted and simulated transmissions, respectively.TÃ is the complex conjugate of T. The Re and Im functions represent taking the real and imaginary parts of a complex number.In the total data set, a 70/30 split for the training and test data set is assigned.As the learning curve shown in Fig. 2(b), the overall test MSE is ∼1.05 × 10 −4 after 50,000 epochs.Figure 2(c) displays an MSE histogram of the predicted transmission from the test set, showing an average MSE of 1.048 × 10 −4 and a 95% data demarcation line <3.5 × 10 −4 , consistent with the low prediction error exhibited by the CDNN after training.Once the complex transmission coefficients are determined from the network, the corresponding phase and amplitude are calculated monolithically at each frequency, as shown in the right part of Fig. 2(a).Several samples are randomly selected from the test set as presented in Fig. S1 in the Supplementary Material, from which we can see that the network prediction results are in good agreement with the simulated truth values, even at those resonant frequency points, which demonstrates that CDNN has reasonably high predicting accuracy.To clarify the efficacy of the CDNN, we also trained two real-valued deep neural networks with real (RDNN) and imaginary (IDNN) parts as outputs, respectively.Detailed information on these two networks is described in Sec. 1 in Supplementary Material.The MSE of the CDNN is ∼30% less than the sum of the MSE of the RDNN and IDNN, which are both ∼7.5 × 10 −5 after 50,000 epochs, indicating that the CDNN has superior performance compared with its real-valued counterparts.Given the vital role the source model plays in our transfer-learning-based method, such highly generalized predicting accuracy is critical to ensure the target model acquires sufficiently reliable knowledge to facilitate the on-demand metasurface design in the target task.

Transfer Knowledge to Target Spectral Model
Benefiting from the knowledge learned by the source model, the target model is trained on the target frequency domain (from 0.5 to 1.0 THz) to predict the complex transmission coefficients of the meta-atoms with a relatively small data set of 1000 samples.The relation of the interest bands of the source task and target task of spectral transfer learning is depicted in Fig. 3 The target model has the same network structure as the source model except for the output layer, whose dimensionality can be adjusted according to the sampling points of the target spectrum; in our case, there are 51 dimensions.
We propose three transfer learning strategies as depicted in Fig. 3(b).(i) The target network copies the first k layers from the source model as the initialization of weights and bias, and the remaining layers of the target network are randomly initialized with a normal distribution.The entire target network is finetuned simultaneously to be trained on the target data set and model, called the "frozen-none" strategy.(ii) The target network copies all hidden layers (except for the last one due to the mismatch of the dimensions) from the trained source model as the initialization of all hyperparameters.During the training process, the first k layers are frozen, meaning that they do not change with training, and only the remaining layers are finetuned, called the "copy-all" strategy.(iii) The target network copies the first k layers from the source model and freezes them, whereas the remaining layers are initialized and fine-tuned, called the "hybrid-transfer" strategy.Further details of the training setup (learning rates, etc.) are given in Sec. 2 of the Supplementary Material.As the major challenge in transfer learning is to select the general layers and specific layers to avoid the negative transfer between the source and target tasks, we conduct three sets of experiments to determine the best transfer-learning strategy as well as the most appropriate number of the transfer layers.The learning curves of these three strategies are shown in Fig. 4(a), and the colors from dark to light indicate the number of transfer layers from less to more, i.e., from 1(0) to 10(9).We aim not to maximize the absolute performance of the target model, but rather to verify that the transfer-learning-based Frequency (THz) Amplitude (a.u.) method has advantages over direct learning.By comparing their performances on the same data set under consistent experimental conditions, we can see that in the frozen-none case, the test error is ∼3.4 × 10 −3 if trained from scratch, whereas it is around 2.7 × 10 −3 with the transferred knowledge, no matter how many layers are copied.The prediction accuracy is improved by about 26% through transfer learning.In addition, the loss function of transfer learning converges earlier than that of direct learning, manifesting a faster training speed.It takes an average of 72 min per 10,000 epochs of training on our computer equipped with an Nvidia RTX 3090 GPU, and our training process runs about 15,000 iterations to converge.In the other two cases, the test error is either higher or lower than that of direct learning as the number of transfer layers changes, which depends on the generality and specialization of different layers as well as the co-adaptation between neighboring layers. 39Our results show that the frozen-none strategy has stronger robustness than the other two, which can achieve higher accuracy than direct learning without careful parameter tuning, indicating it is more applicable for the spectral transfer target task.

Phase (π rad)
Specifically, we use the frozen-none method of copying the first seven layers to train the target CDNN and test the generalization performance on the test set.Several typical test examples are presented in Fig. 4(b).These examples illustrate that the predicted spectra are in good accordance with the simulated ones in the regions, no matter whether the fluctuation is gentle or violent.It is also worth mentioning that this process only takes several milliseconds to calculate the transmission coefficients over the whole bandwidth under consideration of each meta-atom.Such prediction accuracy and efficiency are crucial for the on-demand metasurface design in the inverse model, as will be discussed in the next section.

Hybrid Inverse Model for the On-demand Metasurface Design
The core objective of the spectral transfer-learning-based method is the efficient and accurate on-demand metasurface design in the interested band.We divide the design task of the whole metasurface into the independent search for each meta-atom according to the phase and amplitude distribution oriented by functionality.In order to identify the optimal structure parameters at each pixel of the metasurface while avoiding the exponential growth of the simulation time during the iterations, we propose a hybrid inverse model that combines the data-driven deep-learning method with the rule-driven global optimization algorithm.In this model, the trained target-CDNN is regarded as an EM simulator to replace the traditional EM simulation software, which can precisely predict the transmission coefficients at an extremely high speed.The CAPSO algorithm performs to be a fast generator and a powerful optimizer of the meta-atoms.Specifically, the geometrical parameters given by CAPSO are fed into the target CDNN to predict the corresponding complex spectrum tensor, from which the extracted phases and amplitudes at certain frequencies are evaluated by quantifying the discrepancy between the current results and the goals.The optimal parameters are successively updated in iterative runs to reduce the value of the loss function [as Eq. ( 2) shows] until the maximum epoch is reached, where n is the number of the target frequencies, φ and A are the phase and amplitude at a certain frequency, respectively, and the subscripts goal and optimal represent the target and the current optimum value.η is a customized preset weight factor, whose value depends on the role of the amplitude playing in the functional devices.Instead of constructing an inverse or tandem DNN commonly used in the metasurface inverse design, 29,40,41 we employ this hybrid model chiefly for these two reasons.(i) Previous inverse networks take the entire spectrum tensor as the input, which is suitable for the amplitude-functional devices in the continuous band, such as the filter, absorber, and resonators, whereas most phase-gradient metasurfaces only care about the EM responses at one or several frequencies, leaving the others alone.If the randomly transmission ones in the data set, the network is almost impossible to output a reliable structure owing to nonuniqueness (a spectrum can be generated by structural parameters, or no set of structural parameters can produce such a spectrum).
The inverse model with the optimization algorithm can be readily modified to meet diverse design goals and restrictions. 42or example, we can impose a constraint in the equation on the parameter gap to alleviate the coupling among adjacent metaatoms.We can also design frequency multiplexing devices with different numbers and values.

Metadevice Design and Verification
To verify the efficacy of our proposed hybrid inverse model, first we quickly design a focusing cylindrical metalens through this method.The ideal phase retardation provided by the metalens can be written as the function, where c is the light speed in vacuum, ω is the angular frequency, x is the distance between a meta-atom and the center of the metalens, and f is the focal length.In this design, the metalens with a diameter of 12.15 mm and a focal length of 20 mm is to focus the transmitted waves at 0.95 THz.Each column of the metalens is composed of 81 meta-atoms, and the dimension tensor D ¼ ½ε; r; h; g T of each meta-atom is optimized iteratively according to Eq. ( 2) in turn with φ goal calculated by Eq. (3) and A goal set as 0.5.Figures 5(a) and 5(b) show the target phase and amplitude profiles of the metalens as well as the phase and amplitude responses of optimized meta-atoms selected by the hybrid inverse model at each pixel, which match each other fairly well.The detailed procedures and resulting parameters of this model are described in Secs. 3 and 4 in the Supplementary Material.The metalens is then investigated using CST Microwave Studio, where the metalens is placed on the z 0 plane with the center set at the origin.The optical axis is set as the z axis with the open boundary condition adopted in the x direction and the periodic boundary condition adopted in the y direction.The metalens is illuminated by x polarized plane waves.Figure 5(d) depicts the normalized far-field intensity distribution of this metalens, which is in good agreement with the theoretical calculation derived by the Rayleigh-Sommerfeld diffraction integral formula in Fig. 5(c).The results confirm the validity of the hybrid inverse model and further demonstrate the reliability of our CDNN-based transfer model.
To further verify the efficiency and versatility of our method, we design a 2D metadevice for generating the focused THz vortex beam based on the function as follows: where ðr; θÞ is the polar coordinate of a meta-atom and l is an integer representing the topological charge and is related to the orbital angular momentum.The metadevice is aimed to generate a second-order vortex beam (l ¼ þ2) operating at 0.95 THz, consisting of 61 meta-atoms × 61 meta-atoms with a lattice size of 150 μm; each of them is automatically determined by the hybrid inverse model.Figure 6(a) shows the optimized results of the structure pattern, phase, and amplitude distributions for the designed metavortex generator.The whole prediction process only takes about 600 s, with an average time of <0.1 s for each meta-atom.Its performance is then evaluated using CST Microwave Studio, where the metavortex generator is placed on the z ¼ 0 plane with the z axis as the optical axis.With the open boundary condition being applied, the linearly polarized Gaussian waves with a waist radius of 3 mm are incident on the metavortex generator.Figure 6(c) depicts the simulated results including phase, real part, and normalized intensity distributions, which are in good agreement with the theoretical calculations derived by the Rayleigh-Sommerfeld diffraction integral formula in Fig. 6(b).The deterioration of the simulated results is suggested to be caused by the finite hexahedral mesh limited by the computer memory and the unavoidable coupling among adjacent meta-atoms.

Discussion and Conclusions
We developed a spectral transfer-learning-based comprehensive methodology assisted by CDNN for the realization of the balance between computational efficiency and prediction accuracy to further help with the automated on-demand metasurface design in arbitrary wavebands.We explored the spectral transferlearning strategy to ease the burden of large data volume requirements by migrating the learned knowledge from the original band to the target one by exploiting the commonality for EM properties of the all-dielectric meta-atoms with the same geometrical shape but in different spectral ranges.We proposed to utilize CDNN to train the source model on the given cheap data set, which obtains a fairly low prediction error of 1.05 × 10 −4 with an improvement of ∼30% compared to its real-valued counterparts.We demonstrated that transfer learning can improve the prediction accuracy by ∼26% in a quite short time compared to direct learning on the same small data set with only 1000 samples through the robust transfer-learning strategy named frozen-none.For proof of concept, we presented a focusing metalens and a metavortex generator working at 0.95 THz using the hybrid inverse model that combines the well-trained target CDNN with the CAPSO algorithm.The simulated results of these designed metadevice agreed well with the corresponding theoretical calculations, consequently validating the capability of the proposed comprehensive framework.In view of the underlying logic of our work, the deep-learning-based network is proposed to address the time and computing costs of the conventional simulation tools for metasurface design and modeling, i.e., our network aims at the numerical simulation process to demonstrate that the spectral transfer method is an efficient alternative to the consumed EM simulation software and human experience.As for the experiment demonstration, it will be the application of a well-trained feed-forward network.Our results show the potential to synchronously improve both the efficiency and accuracy of the deep-learning-based method, which will facilitate the fast and reliable metasurface design in arbitrary frequency bands, thus promoting the substantial application of deep learning in disciplines, such as meta-optics, spectral recognition, and resolution enhancement.
4 Appendix: Python-CST Co-Simulation to the CST Studio Suite, which allows controlling a running simulation or reading the results of project files.First, we need to start a CST environment and create a new CST project.Then we edit the VBA codes to set up the model, materials, simulation conditions, and so on.Dimensions of the meta-atoms are randomly generated within their respective ranges, and they are constructed on top of a fused silica substrate with a thickness of 120 μm and a fixed lattice size of 150 μm.Each meta-atom is simulated under the x-polarized plane wave by frequency domain solver with the tetrahedral mesh type.Periodic and perfectly matched layer (open) boundary conditions are used along the transverse (x and y) and longitudinal (z) directions with respect to the propagation of light.A field probe is placed at z ¼ 2000 μm to detect the transmission spectrum of the metaatom, which can be derived from the 1D results.One loop is completed for each simulation run, and the number of loops is the size of the target data set.

Arrange the Layout of Metasurfaces
During the verification process, co-application of Python and CST automates the arrangement of the metasurface layout.
Similarly, we start a CST project and initialize the settings.Then we input the dimension parameters of each meta-atom into CST and place them according to their corresponding positions through VBA codes to arrange them into a whole metasurface in order.The metadevice is simulated under an x-polarized plane wave by a time-domain solver with a time duration of 500 ps.The boundary conditions in the propagation direction are set as open, and those in the x and y directions are set as open (for 2D metadevices, such as metalenses, vortex generators, and holographic plates) or period (for 1D metadevices, such as cylindrical metalenses and deflectors), as needed.Field monitors are placed according to the location of the electric field to be observed.Corresponding codes are provided by the authors.
Yi Xu received her BS degree in optoelectronic information science and engineering from Tianjin University, Tianjin, China, in 2019.Currently, she is working toward her PhD in optical engineering at the Center for Terahertz Waves, Tianjin University, Tianjin, China.Her research interests focus on dielectric metasurfaces, terahertz photonics, and machine learning for the design and optimization of metasurfaces.
Jianqiang Gu is a full professor at Tianjin University, China.He received his BEng degree in electronic science and technology, his MEng degree in physical electronics, and his PhD in opto-electronics technology from Tianjin University, China, in 2004, 2007, and 2010, respectively.Up to now, he has published more than 100 peer-reviewed journal papers with total citations of ∼2700 (H index is 30).His current research interests focus on terahertz spectroscopy, photoconductive antenna, and terahertz subwavelength devices.
Biographies of the other authors are not available.

Fig. 2 Fig. 1
Fig. 2 Schematic of the source model.(a) Illustration for the architecture and the parameters of the CDNN.(b) Learning curves of the CDNN that take the loss value as the function of the epoch.The smoothed train loss (blue curve) and test loss (red curve) are shown in the original learning curves (light gray).(c) Histogram of the MSE for the predicted complex transmission from the test set, where 95% of the data have an MSE < 3.5 × 10 −4 , as indicated by the gray dashed line.

Fig. 3
Fig. 3 Schematic of transfer learning.(a) Diagram of the relation of interest bands of the source and target task and (b) illustration of spectral transfer learning.The top row is the architecture of the source model (blue), and the next rows are the target models (orange) based on three transfer strategies: frozen-none, copy-all, and hybrid-transfer, respectively.Blue blocks represent the copied layers from the trained source network and then are fine-tuned during training.Orange blocks represent the fine-tuned layers with random initialization.The mosaic pattern represents the frozen state.Black rounded rectangles represent activation layers.
(a).The target data set of cylindrical-shaped all-dielectric meta-atoms is established via the commercial software CST Microwave Studio under x-polarized normally incident light from 0.5 to 1.0 THz.In this home-built library, each meta-atom is determined by the same four geometry parameters as the above-mentioned ones in the open-source data set, but within a different range.The ranges of the four parameters are ε ∈ ½10,20, r ∈ ½5,65, h ∈ ½3,65, and g ∈ ½5,95 (all in μm).The complex transmission coefficients over the whole spectrum are sampled into 51 frequency points with an interval of 0.01 THz.The simulation details are described in Sec. 1 in the Appendix.Then we employ transfer learning to help train the target neural network.

Fig. 4
Fig. 4 Results of the spectral transfer learning.(a) of curves of three transfer strategies: frozen-none (red), copy-all (blue), and hybrid-transfer (orange).The colors from dark to light indicate the number of transfer layers from less to more.(b) Examples demonstrating the performance of the target CDNN using transfer learning.

Fig. 5
Fig. 5 Characterization of the metalens designed by the hybrid inverse model.(a), (b) The target phase and amplitude profiles (blue lines) and the phases and amplitudes of the optimized metaatoms at each pixel selected by our inverse model (red hollow circles).(c), (d) The theoretical and simulated normalized intensity distributions of the designed metalens along the propagation plane.

4. 1 Fig. 6
Fig. 6 Characterization of the metavortex generator designed by the hybrid inverse model.(a) The structure pattern, phase, and amplitude distributions of the designed metavortex generator output from the hybrid inverse model.(b), (c) The theoretical and simulated results for the phase, real part, and normalized intensity distributions along the x − y and y − z planes of the designed metavortex generator.