Data augmentation in extreme ultraviolet lithography simulation using convolutional neural network

Abstract. Background In the previous work, we developed a convolutional neural network (CNN), which reproduces the results of the rigorous electromagnetic (EM) simulations in a small mask area. The prediction time of CNN was 5000 times faster than the calculation time of EM simulation. We trained the CNN using 200,000 data, which were the results of EM simulation. Although the prediction time of CNN was very short, it took a long time to build a huge amount of the training data. Especially when we enlarge the mask area, the calculation time to prepare the training data becomes unacceptably long. Aim Reducing the calculation time to prepare the training data. Approach We apply data augmentation technique to increase the number of training data using limited original data. The training data of our CNN are the diffraction amplitudes of mask patterns. Assuming a periodic boundary condition, the diffraction amplitudes of the shifted or flipped mask pattern can be easily calculated using the diffraction amplitudes of the original mask pattern. Results The number of training data after the data augmentation is multiplied by 200 from 2500 to 500,000. Using a large amount of training data, the validation loss of CNN was reduced. The accuracy of CNN with augmented data is verified by comparing the CNN predictions with the results of EM simulation. Conclusions Data augmentation technique is applied to the diffraction amplitude of the mask pattern. The data preparation time is reduced by a factor of 200. Our CNN almost reproduces the results of EM simulation. In this work, the mask patterns are restricted to line and space patterns. It is a challenge to build several CNNs for specific mask patterns or ultimately a single CNN for arbitrary mask patterns.


Introduction
High aspect absorbers used in extremely ultraviolet (EUV) masks induce several mask threedimensional (3D) effects, such as critical dimension (CD) and image placement errors. 1,2 It is necessary to include the mask 3D effects in EUV lithography simulation. Mask 3D effects can be calculated rigorously using electromagnetic (EM) simulators. [3][4][5][6][7] However, these simulators are highly time-consuming for full-chip applications.
Recently, many attempts have been made to simulate the mask 3D effects using deep neural networks (DNNs). They are classified into three models depending on the targets of DNNs. Three possible targets are, from the mask plane to the wafer plane, the near-field amplitude on the mask, the far-field amplitude (diffraction spectrum) at the pupil of the projection optics, and the image intensity on the wafer. In the first model, the target is the near-field amplitude on the mask calculated by EM simulation. [8][9][10][11] This model requires many DNNs to reproduce *Address all correspondence to Hiroyoshi Tanabe, tanabe.h.af@m.titech.ac.jp different near-field amplitudes depending on the source position. In the second model, which is our model, 12 the target of DNN is the far-field amplitude at the pupil of the projection optics. Because the far-field amplitudes are described in momentum (wave vector) space and the source position corresponds to the incident momentum in Koehler illumination, our model naturally parametrizes the source position dependence of the amplitude. The third model 13,14 uses the image intensity on the wafer as the target of DNN. This model is much straightforward than other models because the image intensity is used in the following resist simulation. However, the phase information is lost when the diffraction amplitude is converted to the image intensity. The phase of the amplitude is not included in the targets of this model. The phase indirectly influences the focus dependence of the intensity. Therefore, the model needs many intensity targets at different focus positions for each mask pattern.
In our previous work, 12 we developed a convolutional neural network (CNN), which reproduces the results of the rigorous EM simulation in a small mask area. The prediction time of CNN was 5000 times faster than the calculation time of EM simulation. We trained the CNN using 200,000 data, which were the results of EM simulation. Such a large amount of the data was necessary to reduce the validation loss during the training. Although the prediction time of CNN was very short, it took a long time (∼1 week) to build the training data. Creating the training data in the work was possible because the mask area was small. However, when we enlarge the mask area for optical proximity correction (OPC) in large area, the calculation time to prepare the training data becomes unacceptably long. In the large area OPC process, the large mask area is clipped into many small mask areas. The size of the clipped areas needs to be large enough to avoid the influence from the surrounding mask pattern, at least near the center of the clipped area.
In this work, we apply data augmentation to our CNN, which is a standard technique in DNN. The technique allows us to increase the number of the training data without performing EM calculation, which significantly reduces the time to prepare the training data. In Sec. 2, we explain the detail of our data augmentation technique. In this work, we focus on the application to metal layers and assume that the mask patterns are simple line and space patterns. In Sec. 3, we study the accuracy of our CNN prediction on CDs and edge placement errors (EPEs). Section 4 is the summary.

Data Augmentation for Large Mask Patterns
In the previous work, 12 we assumed a periodic mask pattern with 720 nm × 720 nm mask area. When we clip out a small mask area from the mask data, we should not use the edges of the mask area to avoid the influence of the neighboring mask pattern. According to Ref. 15, the optical interaction range R opt is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 2 8 4 where λ, σ, and NA represent the wavelength, coherence factor, and numerical aperture of the scanner, respectively. The wavelength of EUV light is 13.5 nm, and the numerical aperture of the current EUV scanner is 0.33. The coherence factor depends on the illumination setting, and the typical value is 0.5. Inserting these values in Eq. (1), the optical interaction range R opt ¼ 90 nm. This value is the length on the wafer, and the number is multiplied by four on the mask. Therefore, the optical interaction range on the mask is 4 × R opt ¼ 360 nm. Figure 1 shows the usable mask area excluding the area influenced by the neighboring mask pattern. The mask size L should be larger than 720 nm to get the usable mask area. Therefore, there was no usable area for large area OPC in our previous work.
In this work, we choose 1024 nm × 1024 nm mask area. The usable mask area is 300 nm 2 . The usable area is not large, but the EM simulation time highly depends on the size of the mask area. The calculation time of 1024 nm × 1024 nm mask area takes 162 s using Core i9-9900K CPU. In the simulation, we use 3D waveguide model, [5][6][7] which solves coupled wave equations in momentum space. The calculation time highly depends on the cut-off momentum. In this work, we include the momentum ðk x ; k y Þ, which satisfies λ . This number is six times larger than the size of the pupil NA 4 2π λ . Discretizing the momentum by 2π∕L, there are 2121 ðk x ; k y Þ pairs, which satisfy Eq. (2). The size of the matrix solving the coupled wave equations is 4242 × 4242 because there are two polarizations. The region in Eq. (2) is quasi-hyperbola, which resembles the diffraction spectrum of mask patterns consisting of vertical and horizontal lines or holes. Mask patterns are conventionally designed using X-Y coordinates. The minimum pattern pitch in X or Y direction is small compared to the minimum pattern pitch in the diagonal direction. Therefore, the diffraction amplitude in the diagonal direction decreases rapidly compared to the amplitude in X or Y direction in momentum space.
DNNs require a large amount of the training data. In the previous work, we used 200,000 training data. It will take a year if we calculate the same number of the data with the mask area in this work. Data augmentation is a powerful technique in deep learning to increase the number of the training data with limited original data. In our CNN, the input is the mask pattern, and the outputs are the far-field diffraction amplitude Aðl; m; l s ; m s Þ, where ðl; mÞ is the diffraction order, and ðl s ; m s Þ is the source position (Fig. 2). In 3D waveguide model, [5][6][7] not only the diffraction momentum but also the source position (or incident momentum) is discretized by 2π∕L. As discussed in Ref. 12, assuming the largest σ value to be 1, the diffraction order and the source position are restricted by the pupil shape and the source shape as follows:  Aðl; m; l s ; m s Þ ¼ A FT ðl; mÞ þ A 3D ðl; m; l s ; m s Þ: (5) Figure 4 shows the source position dependence of the mask 3D amplitude. The source position where the amplitude contributes to the image intensity is limited by the source shape and the pupil shape. Only the overlapping area in Fig. 4 contributes to the image intensity. We approximate the mask 3D amplitude in this area by a linear function of the source position ðl s ; m s Þ as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 3 2 3 A 3D x ðl; m; l s ; m s Þ ≅ a 0 ðl; mÞ þ a x ðl; mÞðl s þ l∕2Þ þ a y ðl; mÞðm s þ m∕2Þ; where a 0 is the mask 3D amplitude at the center of the overlapping area: ðl s ; m s Þ ¼ ð−l∕2; −m∕2Þ, and a x and a y are the slopes of the amplitude in the X and Y directions on the source plane, respectively. We call these three numbers as mask 3D parameters.  Equation (6) is slightly different from equation (11) in Ref. 12. We modify the equation for the following reason. 3D waveguide model calculates the diffraction amplitudes at the grid points in Fig. 4. The model solves coupled wave equations, so all the amplitudes are calculated simultaneously. Mask 3D parameters are derived by least square fitting to the amplitudes in the overlapping area. The larger ðl; mÞ, the fewer grid points inside the overlapping area. If the number of the grid points is too small, the overlapping area becomes a line or just a point. In such case, we approximate the amplitude in the area using only a 0 ðl; mÞ as the average of the amplitudes and do not use a x ðl; mÞ and a y ðl; mÞ. Therefore, the number of a 0 ðl; mÞ is 457, whereas the number of a x ðl; mÞ or a y ðl; mÞ is 349.
We use two different methods to derive a 0 ðl; mÞ for small ðl; mÞ and large ðl; mÞ. Equation (6)  Equation (6) contributes to reducing the loss of CNN training in a 0 compared to using equation (11) in Ref. 12. In Eq. (6), we consider only the x polarized amplitude. The difference between the diffraction amplitudes of x and y polarizations is very small, and the polarization change by EUV mask diffraction is negligible as shown in Ref. 12.
The values of mask 3D parameters are determined by the mask pattern and the absorber. In this work, we use Ta absorber with 50-nm thickness. We construct a CNN model, which predicts the mask 3D parameters from the mask pattern. Figure 5 shows the architecture of our CNN. Six independent CNNs are used for the real part and the imaginary part of three mask 3D parameters. Six CNNs are merged into one model after the training. The input is a random line and space pattern with a mask area of 1024 nm × 1024 nm. The pattern size is randomly selected from 60 to 160 nm (15 to 40 nm on wafer). Half of the training data are bright field (BF) masks, and the rest are dark field (DF) masks. 1024 × 1024 binary data are averaged to 256 × 256 float data before inputting to the CNNs. Circular padding 16 is used because we assume periodic boundary conditions for input mask patterns. Figure 6 shows the loss functions of training and validation data for Realða 0 Þ with/without data augmentation. The number of the original data for training is 2500, and the number of the data for validation is 1000. With data augmentation, the original data are shifted by 103 nm increments in both X and Y directions and flipped along the Y axis. Therefore, the number of the training data after the data augmentation is multiplied by 200 to 500,000.
Without data augmentation, the training loss decreases during the training, whereas the validation loss does not. This is a typical overfitting phenomenon. With data augmentation, both the training loss and the validation loss decrease during the training. The validation loss after the training is small. The mean square error per target after the training is 1.52∕457 ¼ 0.0033. For every 457 targets, the maximum value of the input data is normalized to 1 in the training. Figure 7 compares the mask 3D parameters at several diffraction orders for 100 test data. The correlation between the parameters by EM simulation and CNN predictions is generally good. There are some exceptions such as Realða x ð0; 0ÞÞ where the correlation is poor. However, the value is very small compared to the values of other parameters.

CNN Prediction Accuracy
The accuracy of our CNN is verified by calculating the image intensities of test mask patterns. Training data in this work are random line and space patterns. Standard line and space test mask  patterns are used to confirm the accuracy of CNN. Figure 8 compares the image intensities of a line mask pattern by EM simulation, Fourier transformation (FT), and CNN prediction. In the calculations, we assume λ ¼ 13.5 nm, NA ¼ 0.33, and annular illumination with σ in ∕σ out ¼ 0.3∕0.8. The bottom figures show the difference of the intensities between EM simulation and FT or CNN prediction. The difference between EM and CNN is much smaller than the difference between FT and EM. Figure 9 compares the CDs and the EPEs of vertical (V) lines with several line widths. They are measured at the cut line across the V lines in Fig. 8. In addition to EM, CNN, and FT, we plot the result of the simulation using the linear (LIN) approximation of the diffraction amplitude in Eq. (6) (LIN in Fig. 9). The difference between EM and LIN indicates the accuracy of the linear approximation in Eq. (6). Also, the difference between LIN and CNN indicates the accuracy of CNN prediction. The agreement among EM simulation, the linear approximation, and CNN prediction is good. Figure 10 shows the results for horizontal (H) lines. The agreement between LIN and CNN is good but we see small difference between EM and LIN. Adding higher-order terms to Eq. (6) may help reduce this error. Figure 11 compares the image intensities of a space mask pattern, and Figs. 12 and 13 show the CDs and EPEs of V and H spaces with several space widths. Similar results can be seen with space patterns as with line patterns.

Summary
Data augmentation technique was applied to the diffraction amplitude of the mask pattern. Diffraction amplitudes of shifted or Y-flipped mask patterns were calculated using the diffraction amplitude of the original mask pattern. The number of the training data after the data augmentation is multiplied by 200 from 2500 to 500,000. Using a large amount of training data, the validation loss of CNN was significantly reduced compared to the validation loss without augmentation.
We verified the accuracy of our CNN by comparing the results of EM simulation with CNN predictions. Our CNN almost reproduced the CDs and EPEs of line and space patterns.
In this work, the mask patterns are restricted to line and space patterns. We did not include hole patterns, patterns with serifs and assist bars, or curvilinear patterns in the training data. We do not expect our CNN to correctly predict images for these patterns. Neural network is only as good as the data we feed it. It is a challenge to build several CNNs for specific mask patterns or ultimately a single CNN for arbitrary mask patterns.
This work is based on the prior SPIE proceedings paper. 17