Ultra-fast aerial image simulation algorithm using wavelength scaling and fast Fourier transformation to speed up calculation by more than three orders of magnitude

Abstract. An ultra-fast image simulation algorithm is proposed. The new algorithm uses full fast-Fourier-transform (FFT) to calculate the aerial image intensity. The wavelength, 193 nm, was scaled to a number of powers of 2, through scaling the mask with a scaling factor derived from the discrete Fourier transform (FT) format. The mask can then be transformed to the diffraction spectrum in terms of spatial frequency using the FFT algorithm. Similarly, this mask diffraction spectrum can be inverse transformed to the aerial-image by using the inverse-FFT algorithm. The image is finally scaled back to the original image amplitude of the original wavelength and squared to the image intensity. Comparing to the original FT, there is a 4000  ×   to 5000  ×   computation speed improvement with only about 3% intensity deviation. This algorithm provides an efficient engine for lithography optimization.


Introduction
Simulations are intensively used in semiconductor lithography processes, no matter during process development or mass production of IC chips. The computational lithography, generally consisting of optical proximity correction (OPC), 1-7 source mask optimization (SMO), [8][9][10][11][12][13][14] inverse lithography (ILT), [15][16][17] and other simulation packages, [18][19][20] is widely and accurately applied during different phases of process development. At the path-finding stage, new types of phaseshifting mask, resist materials, and resolution-enhancement techniques, should be thoroughly studied and simulated to find opportunities for pushing the patterning resolution. After moving to the process development stage, we need to simulate the process windows to establish the design rules for each masking layer. Before that, the optimum numerical aperture (NA) and the partial coherence parameter σ of the exposure tools have to be evaluated. Moreover, for modern lithography, it needs SMO, OPC, pattern splitting (for multiple patterning), and even ILT, if the resolution scaling coefficient k 1 is beyond the control of conventional lithography process. 21,22 Even at the mass production stage, especially for the foundry with product-layout diversity, to prevent the degradation of wafer yield from patterning hotspots, sophisticated simulations are needed to detune OPC or SMO to remove the hotspots encountered.
There are two performance indicators for the widely used simulations in lithography technology: accuracy and speed. For cutting-edge semiconductor technology, the systematic criticaldimension (CD) error tolerance is less than a few nanometers. For the 12-nm gate length, the CD error specification is just AE1.2 nm, 10% of the patterning CD. Similar but looser CD tolerance are required for the metal and the contact layers. The mask CD must just be located at the center of the wafer random CD-error distribution. A desirable OPC package is used to render the simulated mask CD accuracy by the model and rules derived from all the predictable patterning effects, such as optical, resist, etching and even chemical-mechanical polish. 23 In addition, since there are billions of transistors in a single chip, a speedy simulation is necessary. For example, the 10-nm technology node requires the simulation software to complete the simulation and correction work in less than a week. The computer CPU cores involved could be more than several tens of thousands for multiple products handled simultaneously.
The algorithm presented in this paper can speed up the transformation between the physical space and the spatial frequency space by making typical imaging simulations containing the wavelength parameter to fit the fast-Fourier-transform (FFT) scheme. Depending on the number of integrations needed, typical imaging simulations can be sped up by more than three orders of magnitude.

Basic Simulation
The basic lithographic simulation, no matter the purpose, must include aerial image simulation. For Fourier optics, it is the FT of the amplitude and phase distribution MðxÞ, through the mask; then, inverse-FT back to the spatial domain after multiplying with a low-pass pupil function, PðkÞ. The aerial image intensity IðxÞ of a point coherent light source. [24][25][26] is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 4 ; 3 9 7 (1) wherex is the spatial position,k is the wave-vector with amplitude 1 λ , and λ is the wavelength of the imaging light. For partially coherent light with distribution σðk 0 Þ, the final aerial image intensity is the summation of the individual intensity produced by each coherent source pointk 0 and is governed by Eq. (2), where Hopkin's approximation is used: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 4 ; 3 0 1 The low-pass-filter pupil function, PðkÞ, as a function of the illumination distribution σðk 0 Þ can be expressed as the following: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 4 ; 2 3 5 where NA, the numerical aperture, is defined as n sin θ, n is the refractive index of the imaging medium, and sin θ is the maximum allowed directional component ofk to pass through the lens pupil to contribute to the aerial image. On the other hand, the illumination distribution σðk 0 Þ can also be expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 4 ; 1 4 4 where σ is the partial coherence parameter of the illumination system of an exposure tool. The illumination distribution in Eq. (4) is a conventional disk illumination. For advanced lithography, the illumination distribution is customized for lithography performance and can be an arbitrary Gau et al.: Ultra-fast aerial image simulation algorithm using wavelength scaling and fast. . . distribution within unity, i.e., σ ¼ n sin θ i NA < 1. Here, n sin θ i is the illumination NA, and θ i is the angle between the incident beam to the mask normal. The intensity obtained from Eq. (2) is based on the TE polarization, linear polarization with the direction of the electric field tangential to the wafer surface, and the diffraction amplitudes are scalar summed. For the TM polarization, Eq. (2) needs to be corrected with the vector summation of the individual ampltitude. 19,24 The pupil function PðkÞ in Eq. (3) is an ideal pupil function. Equation (5) shows the real pupil function P 0 ðkÞ, which is the ideal pupil function multiplied by a wavefront error function e i2πW : 19,24 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 7 ; 6 4 4 where E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 7 ; 6 0 6 Z i is the Zernike coefficient and F i is the Zernike Polynomial. Among those terms, Z 3 is the defocus term, which needs to be scaled along with other terms based on the new algorithm and will be discussed later. The computation of Eq. (2) is the core step of lithography simulation. Regardless of the application, be it for phase-shifting mask, mask three-dimensional effect, aberration, defocus, or even ILT, there is a need for a fast aerial image calculation as a base to build advanced applications for development of the realistic semiconductor technology.

Numerical Calculation
There are several ways to numerically calculate the result of Eq. (2). The direct calculations consist of the Abbe and the transmission cross coefficient (TCC) approaches. 24,25,27 The former approach traces each illumination point source and incoherently adds up the intensities. The latter approach calculates the transmission cross coefficient (TCC) matrix in advance. The TCC matrix is the integral of the product of the illumination distribution, pupil function, and the complex conjugate of the pupil function, which is shifted by the incident wave vector. The advantage of the TCC approach is that the TCC needs to be calculated only once after fixing the illumination and the pupil function. Therefore, this method is very suitable for varied mask layout during each simulation. However, direct computation is very time consuming, due to the deep loops of FT.
Another approach is to diagonize the TCC matrix in thek space or thex space to get the eigenvalues and eigenfunctions. The final aerial image intensity can be obtained by summing the convolutions of the eigenfunction and mask with multiplying the eigenvalues. This leads to the sum of coherence system (SOCS) method. 25,27 It is most widely adopted in commercial software packages, especially for the OPC software.

Fourier Transform and Fast Fourier Transform
The FT and inverse FT of M and A are as follows: The FFT is a well-developed methodology for quick evaluation of discrete FT. Since invented 28 in 1965, the scheme has been implemented in many applications and with several representations. 29 The basic and original algorithm uses radix-2. Equation (6) shows the discrete FT pair where Aðp; qÞ and Mðm; nÞ form a discrete FT and inverse-FT pair; p, q, m, n, and N are integers. If N is a number of powers of 2, N ¼ 2 r , in Eq. (8), one of the series can be divided into two parts, with even and odd order components. We use the one-dimensional (1-D) case for convenience of illustration: where C 2np N∕2 and D 2np N∕2 are two sub-series of AðpÞ. Based on Eq. (8), we can derive Aðp þ N∕2Þ by substituting p with p þ N∕2 and remember that p and p þ N∕2 are both integers: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 4 ; 6 1 5 Equation (8) shows that it can save half of the computation. This is because when AðpÞ is calculated, Aðp þ N∕2Þ is known by using Eq. (10). If N is a number of powers of 2, the procedure can be iteratively executed, until only two terms left. If an FT needs N × N computations, the FFT can reduce the computations to N log 2 N.
To directly calculate the aerial image intensity distribution of Eq. (2), when we set 64 pixels in 1-D, it needs 64 4 computations to calculate the two-dimensional (2-D) mask spectrum plus 64 6 computations for the final aerial image intensity. An extra 64 2 computations is needed for scanning the illumination. It is an enormous task, if no other algorithm is used to boost up the computation. This paper uses the FFT to compute the aerial image, as an in-house development engine for research purpose at the expense of a slight loss of accuracy. When the wavelength is not an integer of the powers of 2, it is scaled to such an integer to enable FFT, then scaled back to the original value after all the operations in FFT and inverse FFT. This procedure induced the aforementioned slight loss of accuracy. For schools or research organizations with limited resource to purchase speedy computation equipment or to access industrial simulation packages, this paper provides a powerful algorithm to convert the traditional FT to FFT to save the enormous computational efforts.

Fourier Transformation in Summation Form
It is advantageous to convert the FT of the integral form to the summation form. We use the 1-D case for the mask distribution MðxÞ and the spatial frequency spectrum AðkÞ to demonstrate our methodology: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 4 ; 3 2 1 We convert Eq. (11) into the summation form with discrete sampling E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 4 ; 2 7 3 AðkÞ ¼ Equation (6) cannot be directly used to calculate Eq. (12) for the following two reasons. First, the summation limit of Eq. (8) is N, whereas the summation limit of Eq. (12) is infinite. Second, the Fourier pair of Eq. (8) is symmetrical, but those are not symmetrical of Eq. (12). Because the clear-tone masks, transparent patterns and opaque background, are mostly used for EUV lithography and the pupil function to cut off high-order frequencies is multiplied to Eq. (12), the infinity summation limit in Eq. (12) can be replaced by the finite range of interest. On the other hand, the sampling of k x and x is very asymmetrical. For ArF exposure tools, the exposure wavelength λ is 193 nm. When we set the sampling pixel number to 64 and the pixel size to 25 nm, the sampling range in spatial coordinate x is 1600 nm. In the k space, the k x , the direction component of wave vector with wavelength λ separated, scan range is normally AE2 by considering the maximum NA and illumination partial coherence parameter, both are AE1. But, the scan variables m; n and p; q in Eq. (7) are integers ranging from 0 to N − 1, which are symmetrical. More over, the challenging part to use FFT to boost the computation speed for this system is that λ is 193, definitely not in powers of 2. However, Eq. (12) Now, r and m are both integers ranging from 1 to 64. Equation (13) is the same format as Eq. (8).
Consider two masks, MðxÞ and M 0 ðx 0 Þ, illuminated with wavelengths λ and N, respectively. If these two systems can produce the same mask diffraction spectrum AðkÞ, what is the relationship between MðxÞ and M 0 ðx 0 Þ? In the following, we omit the AE∞ limits in the integral for clarity: After changing the integral order of dk x and dx, Eq. (15) becomes E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 7 ; 4 1 6 where δðxÞ is the delta function. Equation (15) indicates that the two masks differ by a scaling factor λ N and an intensity proportional constant C. Foregoing is a basic concept of diffraction physics. Consider the double-slit diffraction consisting of the slit distance d, wavelength λ and the diffraction angle φ as shown in Fig. 1, the diffraction equation is Fig. 1 The double-slit diffraction shows the scaled wavelength and slit geometry to produce the same diffraction spectrum. Gau et al.: Ultra-fast aerial image simulation algorithm using wavelength scaling and fast. . .
The right side of Eq. (18) shows the mask diffraction spectrum. To maintain the same mask spectrum, the slit distance should be scaled with the wavelength used. Figure 1 shows the scaling of the double slit to produce an identical diffraction spectrum.
In the same way, we now check the image formation after obtaining the mask spectrum. If the same mask spectrum AðkÞ and pupil function PðkÞ form the image with different wavelengths and spatial dimensions as shown in Eq. (19), what is the relationship between the image amplitude distribution EðxÞ and E 0 ðx 0 Þ? E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 1 1 4 ; 6 2 1 Inversing the second part of Eq. (19), we get E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 0 ; 1 1 4 ; 5 7 3 PðkÞAðkÞ ¼ Substituting Eq. (20)  EðxÞ ¼ Following the same procedure in the derivation of Eq. (17), we perform the integration of dk x before the integration of dx 0 in Eq. (21), resulting in E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 2 ; 1 1 4 ; 4 6 3 EðxÞ ¼ By squaring E in Eq. (22), the image intensity distribution for wavelength λ is the same as what we get from wavelength N, except for a scaling factor, which is the reciprocal of Eq. (17). The intensity proportional constant C is ð N λ Þ 2 , which will be discussed later. In this paper, the traditional FT can be converted to the discrete FT format in Eq. (12). The discrete FT for mask diffraction spectrum obtained is then calculated by FFT with the mask dimension scaled based on the wavelength ratio from Eq. (17). The mask spectrum obtained with scaled mask is used to calculate the image amplitude by FFT and this amplitude is finally scaled back to the actual image amplitude based on Eq. (22).

Simulation Flow
This paper uses Matlab ® , which is popular and commercially available, as the simulation development platform. The simulation flow aims to approach the conventional FT with the FFT format and algorithm. There are three key steps for the simulation. With off-axis illumination in both directions, the full range of k is now from −2 to þ2. From Eq. (13), 2πi 1 λ ðrΔxÞðmΔk x Þ ¼ 2πi rm 123.52 ¼ 2πi rm β , pick the nearest integer of β that is the powers of 2. In this case, set N to 128.
Second step: Scale the mask From the first step, the scaling factor is ε ¼ N β ¼ 128 123.52 . The mask mðxÞ is scaled up with ε. Because the total pixel number of the mask distribution is 64 × 64, but N is 128, we need to pad zeros to match the row and column elements of the mask matrix to 128 × 128. Now, the new mask matrix is ready for FFT.  Although there are two more steps for the new algorithm, the computation speed is much faster than the conventional method.

Simulation Results
Two cases are used to verify the simulation, a line-and-space pattern and a hole pattern. The critical dimensions and simulation conditions are listed in Table 1.
We use 193-nm wavelength, 64 pixels in 1-D, 4 64 Δk x pixel size, 25-nm Δx. The resulting artificial wavelength for the FFT. Figure 3 shows the simulation results, including the original mask, the mask after scaling, the comparison of the 2-D intensity contours and the 1-D intensity cutlines.
The comparison of the simulation results, shown in Fig. 3, consists of the 2-D contours and the 1-D intensity cut lines, evaluated with conventional FT and the wavelength-scaled FFT. They are almost indistinguishable. There is only a slight deviation at the intensity peaks, which is not of the highest interest for lithography. The deviations are mostly from the grid snapping or the  Table 1 The simulation condition of (a) hole case and (b) line/space case.  Table 1(b) is 100 nm and the scaling factor ε is 1.03463. The mask width after scaling is about 103.46 nm but the pixel size is 25 nm for the mask. The mask size can only be 100 or 125 nm. To reduce the error, the mask edge was smoothened out with the bilinear interpolation, the transmission becomes not exactly 0 or 1 at the edge of a binary mask. Figure 4 shows the mask intensity distribution after the scaling in contrast to the original binary mask. It is relatively easy to increase the number of pixels to enhance the resolution for the new method. However, it is prohibitive to compare between FT and wavelength-scaled FFT, because the deep loop of conventional FT takes too long to complete, especially when the number of pixels is larger than 100 in each dimension. Table 2 lists the speed and intensity error evaluated with conventional FT and wavelength-scaled FFT for cases (a) and (b). The intensity error here is defined by comparing the integrated intensity under the cut line of the FT and wavelength-FFT, the last part of Fig. 3. For the real pupil function P 0 , pupil function with aberrated wave front and the Zernike coefficient need to be scaled with the new algorithm. Take Z 3 , defocus, as an example 19,24 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 4 ; 1 1 7 ; 5 7 3 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 5 ; 1 1 7 ; 5 5 5 where Δδ is the focus shift. For the conventional FT, the aerial image intensity can be calculated by replacing the ideal pupil function P in Eq. (2) with the real pupil function P 0 in Eq. (5) and the Zernike coefficient Z 3 , polynomial F 3 , respectively, in Eqs. (23) and (24). On the other hand, for the wavelength scaling FFT, Eq. (22) is still valid. But, it needs to scale the Zernike coefficient This scaling for the wavelength-FFT calculation of the defocus image intensity matches well with the result of FT.
The computation time of the wavelength-scaling FFT method and that of the conventional FT are listed in Table 2. Although there are two extra mask scaling and image amplitude scaling steps, the computation time of the new method way exceeds expectation. In theory, with 64 × 64 pixels for the 2-D map, mask or intensity, it needs 64 4 computations for the conventional FT. If converted to the wavelength scaled FFT, the computations become ð64 log 2 64Þ × ð64 log 2 64Þ. Roughly, it is about 100× runtime saving. There are two major portions for the computation depicted in Fig. 2. Those are step A to C for the first portion and steps C to E for the second portion. For the first portion computation, it is just FT of the mask. It takes 2.7 s for the conventional FT, whereas wavelength FFT only takes 0.025 s, although with one additional step to scale up the mask. It is roughly 108× faster for the new algorithm. For the second portion computation, it needs to scan the whole illumination to sum up the intensities originated from the individual illumination pixel. The inverse FT is executed in this portion. In the FT method, the computation time is just 130 × 2.7 and 250 × 2.7 s for the two illumination settings involved in the two cases listed in Table 1. On the other hand, during the illumination scanning, the repeated computation quickly drops from 0.025 s to 5 × 10 −4 s for the wavelength-scaling FFT. The overall runtime becomes 130 × 5 × 10 −4 and 250 × 5 × 10 −4 plus the overhead runtime for the new algorithm. It is possibly due to the new method using very little deep-loop computations with only matrix operations and Matlab ® built-in functions, such as fft2, ifft2, fftshift, ifftshift. . . In conclusion, the conversion from FT to FFT saves ∼100× runtime. And, because of the coding efficiency of matrix operation, extra 40× to 50× runtime can be saved. It is an extra benefit to convert the coding with matrix operations.
From Table 2, the simulation speed is improved by 4000× to 5000× with the new wavelength-scaled FFT method. It improves the simulation speed at the expense of intensity deviation of the order of 3%. The proportional constants in Eqs. (17) and (22) can be derived by considering the consistency of the flux passing through the mask per unit area. Since the mask was scaled with ε in 1-D, the ratio of the total transparent area in the mask is increased by ε. 2 To keep the mask spectrum consistent after the mask scaling, the mask transmittance after scaling is normalized by 1 ε 2 . In the same way, we can normalize the final aerial image intensity. Once the normalization constants were fixed, we need to calibrate the results of wavelength-scaled FFT with those of conventional FT only once.

Conclusion
This paper reports a new methodology that converts the aerial image calculation from the timeconsuming traditional method in deep loops to a 4000× to 5000× improvement in computing speed, at the expense of a 2% to 3% intensity error due to grid snapping. The performance of the new algorithm was tested with several cases and the results with the improvement and error are consistent. Coding is very easy and straightforward on popularly commercially available platforms. Although there are also methodologies that can greatly save the computation time, such as the widely used SOCS, the method reported in this paper is more straightforward and can be easily adopted. SOCS needs to setup and solve the eigenvalues and eigenfunctions, whenever a new illumination or pupil function is adopted. This method only needs to set up the converting pixel and scaling factor once for all. It is very suitable to be adopted as the running engine of optimization problems. The potential of using this method for lithography process development is expected to be high.