Evaluation of convolutional neural network for fast extreme ultraviolet lithography simulation using imec 3 nm node mask patterns

Abstract. Background Mask 3D (M3D) effects distort diffraction amplitudes from extreme ultraviolet masks. In our previous work, we developed a convolutional neural network (CNN) that very quickly predicted the distorted diffraction amplitudes from input mask patterns. The mask patterns were restricted to Manhattan patterns. Aim We verify the potentials and the limitations of CNN using imec 3 nm node (iN3) mask patterns. Approach We apply the same CNN architecture in the previous work to mask patterns, which mimic iN3 logic metal or via layers. In addition, to study more general mask patterns, we apply the architecture to iN3 metal/via patterns with optical proximity correction (OPC) and curvilinear via patterns. In total, we train five different CNNs: metal patterns w/wo OPC, via patterns w/wo OPC, and curvilinear via patterns. After the training, we validate each CNN using validation data with the above five different characteristics. Results When we use the training and validation data with the same characteristics, the validation loss becomes very small. Our CNN architecture is flexible enough to be applied to iN3 metal and via layers. The architecture has the capability to recognize curvilinear mask patterns. On the other hand, using the training and validation data with different characteristics will lead to large validation loss. The selection of training data is very important for obtaining high accuracy. We examine the impact of M3D effects on iN3 metal layers. A large difference is observed in the tip to tip (T2T) critical dimension calculated by the thin mask model and thick mask model. This is due to the mask shadowing effect at T2T slits. Conclusions The selection of training data is very important for obtaining high accuracy. Our test results suggest that layer specific CNN could be constructed, but further development of CNN architecture could be required.

There are two types of EM simulation methods. The finite-difference time-domain (FDTD) method solves Maxwell's equations in coordinate space, and the solution is near-field diffraction amplitudes. [3][4][5] The calculation times shown in the literatures are 322 s for 500 nm × 500 nm 4 and 3.64 s for 1.5 μm × 1.5 μm. 5 Because the FDTD method calculates near-field diffraction amplitudes in coordinate space, the calculation needs to be repeated for all source points, which is typically more than 100 points.
Rigorous coupled-wave analysis 6 and the 3D waveguide model 7 solve Maxwell's equation in momentum (or frequency) space, and the solution is far-field diffraction amplitudes. These models solve coupled wave equations in momentum space, and all relations between the incident momentum and the outgoing momentum are calculated simultaneously. Therefore, these models do not need to repeat the calculation for different source points. The computation time is 122 s for 256 nm × 256 nm. 9 To speed up the EM simulations, several models that decomposed a 2D mask pattern into 1D patterns were proposed. [10][11][12] In these models, the EM field of 2D mask pattern was approximately calculated by superposing the EM fields of 1D patterns. These models are currently used in many EUV lithography simulators. [13][14][15] An implicit assumption of the pattern decomposition method in the models is that the mask pattern is large and isolated. However, OPC masks are decorated by small patterns [serifs and assist features (AF)], and the pattern densities are high. Also, advanced OPC mask patterns are curvilinear. It is not clear if the pattern decomposition method can be applied to OPC masks.
Recently, many attempts have been made to simulate the M3D effects using deep neural networks, such as convolutional neural network (CNN) or generative adversarial network. They are classified into three types depending on the target: near-field amplitude on the mask, [16][17][18][19] far-field diffraction amplitude at the pupil, 20 and image intensity on the wafer. 21,22 In our model, 20 a CNN is used to predict the far-field diffraction amplitude from the input mask pattern. Although the training of CNN takes a very long time (more than 1 day), the prediction time is very short, 0.05 s for 256 nm × 256 nm. 9 Our model is a natural extension of the optical simulation, in which the far-field diffraction amplitude [Fourier transformation (FT) of a mask pattern] is used to calculate the image intensity. As shown in Ref. 20, because our model is described in frequency space, it can easily incorporate the transmission mission cross coefficient method 23 and sum of coherent systems model 24 conventionally used in optical simulations to speed up the image intensity integration.
In this work, we apply the CNN architecture developed in the previous work to mask patterns that mimic imec 3 nm node (iN3) logic metal or via layers. 25,26 In addition, to study more general mask patterns, we train the CNN using iN3 metal/via patterns with OPC or curvilinear via patterns. In total, we develop five different CNNs using different mask patterns for the training data. We examine the potential and the limitation of these CNNs using five different mask patterns for the validation data.
As mentioned at the beginning of this section, M3D effects of EUV masks have a large influence on CD and EPE. The motivation of this work is to include M3D effects in EUV lithography simulation. We find that the iN3 metal layer is a good example to show large M3D effects. We verify the accuracy of CNN by calculating the image intensity of iN3 metal layer mask patterns.
In Sec. 2, we explain our model to calculate the diffraction amplitudes from EUV masks. In Sec. 3, we explain the architecture of our CNN. In Sec. 4, we examine the potential and limitation of CNN using iN3 mask patterns. In Sec. 5, we study M3D effects on the iN3 metal layer. Section 6 is the summary.

Diffraction Amplitudes from an EUV Mask
In optical lithography simulation, a thin mask model is conventionally used, and the diffraction amplitude is the FT of a mask pattern. However, an EUV absorber is thick, and the diffraction amplitudes from an EUV mask need to be calculated by rigorous EM simulations. Figure 1 shows the schematic view of the diffraction amplitudes Aðl; m; l s ; m s Þ from an EUV mask. We show here the vector potential A. Inside the vacuum, the vector potential is converted to the electric field E by 20 Tanabe, Jinguji and Takahashi: Evaluation of convolutional neural network. . .
where k represents the wave vector. It can be easily shown that the electric field is perpendicular to the wave vector: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 7 ; 4 7 5 The electric field depends on both the diffraction order (l; m) and the source position (l s ; m s ). This is the basic difference from thin mask model in which the diffraction amplitude (FT of a mask pattern) depends only on the diffraction order (l; m). The image intensity on wafer I is calculated by Abbe's theory as follows: where S and P are the effective source and the pupil function, respectively. We use the 3D waveguide model 7 to calculate the diffraction amplitude A from an EUV mask. The model solves the following coupled wave equations for A x and A y as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 7 ; 3 2 9 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 7 ; 2 8 0 where ϵ is the complex dielectric constant of the patterned absorber. Gauge transformation freedom allows A z to be fixed at zero. 8 Two variables A x and A y correspond to two polarizations. Equation (1) indicates that the electric fields E of A x and A y polarizations are almost parallel to the x and y axes because k x ,k y ≪ k near the optical axis. 20 Figure 2 is an example of diffraction amplitudes calculated by solving Eqs. (4) and (5) (for the details, see Ref. 20). The result shows that the polarization change between the incident wave and the outgoing wave is very small. This is because the complex dielectric constant of the EUV absorber is close to 1. A similar phenomenon is known as "weakly guiding approximation" in optical fiber, 27 in which two polarizations are decoupled. We therefore focus on the diffraction amplitudes in which both the incident and outgoing waves have A x polarization.
The diffraction amplitude A x ðl; m; l s ; m s Þ is divided into the thin mask amplitude A FT x ðl; m Þ and M3D amplitude A 3D x ðl; m; l s ; m s Þ and is given as  Tanabe, Jinguji and Takahashi: Evaluation of convolutional neural network. . .

The thin mask amplitude A FT
x is calculated by the FT of the mask pattern using the reflection coefficients of the absorber and the multilayer. It only depends on the diffraction order (l; m).

M3D amplitude A 3D
x is defined as the difference between the thick mask amplitude A x and the thin mask amplitude A FT x . The M3D amplitude depends on the source position (l s ; m s ), which causes incident-angle dependent M3D effects.
As shown in Fig. 3, the contribution of the thin mask amplitude is dominant. The amplitude does not depend on the source position. The contribution of the M3D amplitude is small but not negligible.

CNN Architecture
The M3D amplitude gradually changes depending on the source position ðl s ; m s Þ. We parametrize the M3D amplitude at each diffraction order ðl; mÞ as a linear function of the source position ðl s ; m s Þ (Fig. 4) as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 4 ; 1 3 1

A 3D
x ðl; m; l s ; m s Þ ≅ a 0 ðl; mÞ þ a x ðl; mÞðl s þ l∕2Þ þ a y ðl; mÞðm s þ m∕2Þ; where a 0 is the average of the amplitude and a x and a y are the slopes of the amplitude in the x and y directions, respectively. We call these three numbers the M3D parameters. In Fig. 4, we consider the area where the maximum effective source (σ ¼ 1) and the projection pupil overlap.  Only this area has the possibility of contributing to the image intensity. The center of the overlapping area between the source and the pupil is ðl s ; m s Þ ¼ ð−l∕2; −m∕2Þ. M3D parameters are determined by the mask pattern and the absorber. In this work, the absorber is assumed to be Ta with a 60 nm thickness. The input mask pattern has a 1024 nm × 1024 nm area. We construct a set of CNNs to predict the M3D parameters from the input mask pattern. Figure 5 shows the architecture of our CNNs. Six independent CNNs, CNN1-6, are used for the real and imaginary parts of the three M3D parameters, a 0 ; a x , and a y . 1024 × 1024 binary data are averaged to 256 × 256 float data before being input to the CNNs. We repeat convolution/ max pooling/batch normalization five times. The number of the free fitting parameters for each CNN is 69.6 M. Six CNNs are merged to one model after the training.
One of the issues in CNN is that it requires a huge amount of training data. To get a high prediction accuracy, it takes a long time to prepare a large amount of training data. We applied data augmentation techniques to circumvent this issue. 9 Assuming a periodic boundary condition, the diffraction amplitude of a shifted mask pattern can be calculated easily by multiplying a phase factor due to the pattern shift by the diffraction amplitude of the original mask pattern. In this way, we do not need to repeat time-consuming EM simulations to calculate the diffraction amplitudes of shifted mask patterns.   The number of the original data for training is 2000, and the number of the data for validation is 1000. With data augmentation, the original data is shifted by 103 nm increments in both the X and Y directions. Therefore, the number of the training data after the data augmentation is multiplied by 100 to 200,000.
Without data augmentation, the training loss decreases during the training, but the validation loss does not. This is a typical overfitting phenomenon. With data augmentation, both the training loss and the validation loss decrease during the training.

Potential and Limitation of CNN
In the previous work, 9 the input mask patterns were Manhattan patterns, as shown in Fig. 5. In general, the accuracy of neural networks depends on their training data. The CNN trained by Manhattan patterns cannot be used to general mask patterns. However, our CNN architecture contains 70 M parameters, and the architecture itself could be applied to general mask patterns. In this work, we apply the same CNN architecture to mask patterns that mimic iN3 logic metal or via layers. The design rules of iN3 metal and via layers are as follows. 25,26 The minimum pitch of the metal 1 layer is 28 nm, and the minimum tip to tip (T2T) CD is 20 nm. The via layer has different pitches and CDs in the X and Y directions: pitch X∕Y is 42/36 nm and CD X∕Y is 26/18 nm. Figure 7(a) shows the metal pattern and the result of CNN training. We show here the result for CNN1 in Fig. 5, but similar results are obtained for CNN2-6. We use 200,000 random mask patterns and corresponding M3D parameters for the training dataset. Both the training loss and the validation loss decrease rapidly as the training proceeds. The maximum value of the data for each diffraction order is normalized to 1 before training. The loss is calculated by averaging the mean square errors for all diffraction orders. The validation loss after 100 epochs is 0.0006. It is a very small number, and we expect that the difference between the image intensity of CNN prediction and that of EM simulation is small. We confirm the accuracy of the CD calculated by CNN in Sec. 5.
To study more general mask patterns, rule-based OPC is applied to iN3 metal patterns. Figure 7(b) shows the metal pattern with AFs (5 nm) and hammer head (HH, 3 nm). Both the training loss and validation loss decrease, but the speed is slightly slower than that of Fig. 7(a). Training speed depends on the complexity of mask patterns. The training loss at 10 epochs is 0.0027 for Fig. 7(a) and 0.0036 for Fig. 7(b). Figures 8(a)-8(c) show iN3 via mask pattern, via pattern with AF (6 nm) and curvilinear via pattern, respectively. In all cases, the training loss and the validation loss decrease during the training. The training speed depends on the complexity of the mask patterns. The training loss at 10 epochs is 0.0011 for Fig. 8(a), 0.0035 for Fig. 8(b), and 0.0042 for Fig. 8(c). Curvilinear via patterns can also be trained by CNN. Our CNN architecture has the capability to recognize curvilinear mask patterns.
In general, CNN is only as good as the data we feed it. Figure 9 verifies this rule. When we use the same kind of training data and validation data, the validation loss becomes very small. Our CNN architecture is flexible enough to be applied to any iN3 metal or via layer. However, the validation loss is large when we use different kinds of training data and validation data. The selection of training data is very important for obtaining high accuracy. Real mask data, such as test element group (TEG) mask patterns, are desirable. The data could involve a diversity of patterns not included in this work.

M3D Effect on iN3 Metal Layer
In this section, we study the impact of the M3D effects on the iN3 metal layer, especially the shadowing effect on the T2T CD. We follow the design rule of the metal 1 layer 25  In all cases, the line CD decreases and the T2T CD increases as the pattern pitch becomes larger. This is because the image contrast of small pitch patterns is high when dipole illumination is used. In the case of the line CD, the difference among EM, CNN, and FT is small. However, in the case of the T2T CD, the difference among the three models is large, especially between EM (or CNN) and FT. This can be explained by the shadowing effect at the T2T slits (Fig. 11). Oblique incident light casts a shadow in the Y direction. This effect is included in EM and CNN because M3D amplitudes, which cause shadowing effect, are included in the diffraction amplitudes of these models, but not in FT. The shadowing effect darkens between the two line ends and reduces the T2T CD when the image intensity threshold model is used. Therefore, the T2T CD of FT is larger than that of EM or CNN.
We further study the impact of the M3D effect on iN3 metal patterns. We generate 100 random metal mask patterns as shown in Figs. 12(a) and 12(b). In each mask pattern, we put a T2T slit at the center of the mask and measure the T2T CD and line CD of the center line. In the  case of the line CD, the root mean square deviation (RMSD) between the EM and FT CDs is 0.19 nm, and the RMSD between the EM and CNN CDs is 0.17 nm. Both numbers are negligibly small compared with the designed line width of 14 nm. However, in the case of the T2T CD (with a designed space width of 20 nm), the RMSD between the EM and FT CDs is 6.53 nm, whereas the RMSD between the EM and CNN CDs is 0.96 nm. A large deviation is observed between EM and FT CDs, which suggests the influence of the shadowing effect at the T2T slit. Figures 13(a) and 13(b) show the results for iN3 metal patterns with AFs and HHs. Similar results are obtained even when OPC masks are used.

Summary
We apply the CNN architecture developed for Manhattan patterns to mask patterns, which mimic iN3 logic metal or via layers. In addition, we train CNNs using iN3 metal/via patterns with OPC and curvilinear via patterns. In all cases, the validation loss becomes very small. Our CNN architecture is flexible enough to be applied to any iN3 metal or via layers. Even curvilinear mask patterns can be trained by CNN. Our CNN architecture has the capability to recognize curvilinear mask patterns.  When we use the training and validation data with the same characteristics, the validation loss becomes very small. On the other hand, using the training and validation data with different characteristics leads to a large validation loss. The selection of training data is very important for obtaining high accuracy. Real mask data, such as TEG mask patterns, are desirable, but they are hard to obtain for us.
Mask pattern recognition by CNN is the key for fast EUV lithography simulation. The accuracy of CNN depends on the quantity and quality of training data. Our test results suggest that layer specific CNN could be constructed, but further development of the CNN architecture might be required because our architecture is a simple repetition of convolution/max pooling/batch normalization. It is a big challenge to build a universal CNN for general mask patterns.
This work is based on a prior SPIE proceedings paper. 28 The data supporting the findings of this study are available within the paper.