## 1.

## Introduction

In attempts to push the limit of optical lithography, several resolution enhancement techniques (RETs) have been widely applied to improve the lithographic performance.^{1}2.^{–}^{3} Inverse lithography technology (ILT),^{4}^{,}^{5} as an active approach of RETs, is considered as an effective and economically viable way to meet various challenges in current and future technology nodes.^{6}^{,}^{7} To put ILT into practice, various methods have been proposed by academic as well as industrial communities on mask manufacturing rule constraints,^{8}^{,}^{9} pattern grouping strategy to accelerate computing,^{10} GPU-based hardware accelerated techniques,^{11} and mathematical solution methods for the inverse problem.^{4}^{,}^{5} In this paper, we focus on improvement of mathematical solution methods, where the computational efficiency is one of the most noteworthy issues. Currently, as critical dimension (CD) shrinks, the pattern density of integrated circuits gets much denser and lithographic process variations, such as lens-wafer defocus and exposure dose variation, become more pronounced. It, therefore, requires finer mask models and through process window compensated approaches, which in return significantly increase the computational burden.

Inverse lithography technique treats the mask design as an inverse mathematical problem that aims at synthesizing an input mask to deliver a desired output pattern on the wafer. Liu and Zakhor pioneered a mask design method based on the branch and bound algorithm and simulated annealing.^{12}^{,}^{13} But this is a time-consuming process. In order to reduce the computational complexity, iterative methods were proposed to solve the inverse problem via an optimization process,^{14}^{,}^{15} and they were further classified as linear, quadratic, and nonlinear optimization problems.^{16}^{,}^{17} Meanwhile, Poonawala and Milanfar designed the model-based optical proximity correction system and introduced the steepest descent algorithm for the optimization framework.^{18}^{,}^{19} Subsequently, the optimization framework was further generalized for phase-shifting masks^{20}^{,}^{21} and partially coherent imaging systems,^{22}23.24.25.^{–}^{26} and the optimization algorithm was improved with an active set method^{21} and with an augmented Lagrangian method.^{27} Most recently, Lv et al. further improved the computational efficiency of the mask synthesis by using the conjugate gradient and an optimal iterative step,^{28} and by introducing a mask filtering technique to enhance the regularity of the synthesized mask pattern.^{29}^{,}^{30} On the other hand, with ever-shrinking feature size, the printed feature becomes increasingly sensitive to the fluctuation of the fabrication process, which limits the yield in semiconductor industry. In order to synthesize masks that are robust to process variations, the average wafer performance is optimized via minimizing the expectation of the difference between the desired pattern and the output pattern with respect to process fluctuations.^{31}32.33.34.35.^{–}^{36} This approach takes into account process variations explicitly, and is well understood and easily accomplished. However, it further increases the computational complexity. It is noted that all these methods discretize mask as a raster image constituted by grids (pixels) and perform mask synthesis on a certain fine grid throughout the mask synthesizing process. Since the computational complexity of such iterative methods is $O[K\mathrm{log}(K)]$,^{14}^{,}^{17}^{,}^{18} where $K$ is total grid numbers used in the discretization of mask, the mask grid size strongly impacts the computational cost. Naturally, one would like to employ a coarser grid to model mask and, thus, to cut down the runtime. Unfortunately, such straightforward strategy usually leads to inaccurate simulation results and a poorly performing mask that fails to optimally utilize the imaging capability of the lithography scanner. There has been a tradeoff between the runtime and mask quality.

In this paper, a cascadic multigrid (CMG) algorithm^{37}^{,}^{38} is introduced to improve the computational efficiency of robust inverse mask synthesis, while maintaining the model accuracy and mask pattern quality. The fundamental idea of CMG algorithms is to perform more iterations on coarser grids and fewer iterations on fine ones, so that the overall runtime may be dramatically reduced. The CMG algorithm is a one-way multigrid method that requires no coarse grid correction and is, thus, easy to implement. Since the computational complexity of iterative methods is $O[K\mathrm{log}(K)]$, the CMG algorithm significantly reduces the runtime compared to the conventional methods, which accomplish mask synthesis on a fixed fine grid. More details of the CMG algorithm will be presented in Sec. 3.3. For theoretical convergence results and complexity analysis on CMG algorithms, interested readers may refer to Refs. 3738.39. to 40.

Moreover, we have recently developed an analytical circle-sampling technique for fast imaging simulation of partially coherent systems by decomposing the double-impulse response (DIR) function into analytical kernels.^{41} The DIR function,^{42} whose Fourier transform is the familiar transmission cross-coefficient (TCC), is projected onto the circle-sampling function (CSF)^{41} space and converted into a much smaller projected matrix. Singular value decomposition (SVD) is performed to the smaller projected matrix, and then we obtain analytical optical kernels with its eigenvectors and the CSFs. We have demonstrated that this method avoids directly performing SVD to the large TCC matrix; the simulation speed is, thus, dramatically improved. In particular, the derived optical kernels have analytical forms; as a result, the grid size of the kernels can be set to any desired value without re-preforming decomposition. Also, most recently we developed a metric called edge distance error (EDE) to guide mask synthesis.^{29}^{,}^{30} Compared to the commonly used metric pattern error,^{16}17.18.19.20.21.22.23.24.25.26.27.^{–}^{28} the metric EDE has a dimension of length and is independent of the simulation grid size. The analytical circle-sampling technique and EDE are both independent of grid size. Such nature is well suited for the CMG algorithm, which frequently changes the mask grid size during its synthesizing process. In this paper, we will apply the analytical circle-sampling technique to the forward lithography imaging modeling and employ EDE to guide mask synthesis.

The remainder of this paper is organized as follows. Section 2 details the forward lithography imaging model using an analytical circle-sampling technique. Section 3 introduces the robust inverse lithography problem and presents CMG algorithm in detail. Section 4 provides simulation results to demonstrate the validity and efficiency of the proposed method. Finally, we draw some conclusions in Sec. 5.

## 2.

## Forward Lithography Imaging Model

We apply an analytical circle-sampling technique to the forward lithography imaging simulation. The lithography imaging process is often decomposed into two parts, namely the optical image formation and the resist development. The optical image $I(\mathbf{r})$ generated by a partially coherent imaging system can be expressed by a bilinear transform in the spatial domain as^{43}

## (1)

$$I(\mathbf{r})=\int \int D(\mathbf{r}-{\mathbf{r}}_{1},\mathbf{r}-{\mathbf{r}}_{2})M({\mathbf{r}}_{1}){M}^{\u2020}({\mathbf{r}}_{2})\mathrm{d}{\mathbf{r}}_{1}\mathrm{d}{\mathbf{r}}_{2},$$## (2)

$$D({\mathbf{r}}_{1},{\mathbf{r}}_{2})=J({\mathbf{r}}_{1}-{\mathbf{r}}_{2})H({\mathbf{r}}_{1}){H}^{\u2020}({\mathbf{r}}_{2})$$^{42}Here, $J({\mathbf{r}}_{1},{\mathbf{r}}_{2})$ is the mutual intensity function that describes the coherence of the illumination source, and $H(\mathbf{r})$ is the point spread function (PSF) of the optical system that may be obtained by an inverse Fourier transform of the pupil function $\tilde{H}(\mathbf{f})$ as where $\tilde{H}(\mathbf{f})$ is the pupil function without any aberrations and $\mathbf{f}$ is the normalized pupil plane coordinate.

Since the DIR function is Hermitian and band-limited, it is possible to approximate the DIR function as a finite CSF series as^{41}

## (4)

$$D({\mathbf{r}}_{1},{\mathbf{r}}_{2})=\sum _{k=1}^{N}\sum _{l=1}^{N}{p}_{k,l}{\phi}_{k}({\mathbf{r}}_{1}){\phi}_{l}^{\u2020}({\mathbf{r}}_{2}),$$## (6)

$$D({\mathbf{r}}_{1},{\mathbf{r}}_{2})=\sum _{i=1}^{Q}{\mu}_{i}{\varphi}_{i}({\mathbf{r}}_{1}){\varphi}_{i}^{\u2020}({\mathbf{r}}_{2}),$$Substituting Eq. (6) into Eq. (1), the optical image is given by

## (8)

$$I(\mathbf{r})=\sum _{i=1}^{Q}{\mu}_{i}{|{\varphi}_{i}(\mathbf{r})\otimes M(\mathbf{r})|}^{2}.$$Subsequently, the optical image $I(\mathbf{r})$ goes through the resist development to form the printed image on the wafer. The resist effect is often approximated by a constant threshold resist model using the following logarithmic Sigmoid function:^{18}

Putting Eqs. (8) and (9) together, the lithography imaging equation can be formulated as

## (10)

$$\mathrm{Z}(\mathbf{r})=\mathrm{\Gamma}\{\mathrm{M}(\mathbf{r})\}=\mathrm{sig}[I(\mathbf{r})]=\mathrm{sig}\left[\sum _{i=1}^{Q}{\mu}_{i}{|{\varphi}_{i}(\mathbf{r})\otimes M(\mathbf{r})|}^{2}\right],$$## 3.

## Inverse Mask Synthesis

Ideally, the output pattern on the wafer is desired to match the input design intent that serves as the start point in most mask synthesis routines. However, the optical imaging system typically acts as a low-pass spatial frequency filter and cannot deliver the high-frequency components of the intended pattern on wafer.^{7} The band-limited system causes the output pattern to be a distorted version of the input intent. So, the objective of inverse lithography is to synthesize a mask pattern to precompensate the effect of frequency loss so as to deliver a wafer image closer to the desired pattern.

## 3.1.

### Evaluation Metrics

In order to evaluate the performance of the synthesized mask pattern, we introduce some evaluation metrics.

An EDE is introduced as a metric to evaluate the difference between the output pattern of the input mask and the desired pattern. Figure 2 depicts the pixel-based representation of a mask pattern and its output pattern on the wafer, where the red dots are discrete sampling elements (grids or pixels) of the patterns, ${\mathit{S}}_{\text{shadow}}$ denotes the absolute difference area between the desired pattern contour and the output pattern contour, $L$ is the perimeter of the desired pattern contour, and ${\delta}_{x}$ and ${\delta}_{y}$ are the lengths of the element along the $x$ and $y$ directions, respectively. EDE is defined as

Assuming that the element size is small enough in Fig. 2, the absolute difference area ${S}_{\text{shadow}}$ can be approximated by multiplying the total number of elements ${N}_{\mathrm{s}}$ in shadow and the element area as

Noticing that the value of the grid in the output pattern is either 0 or 1, the number ${N}_{\mathrm{s}}$ is approximately equal to the pattern error as

where $\Vert \xb7{\Vert}_{2}$ is the ${L}_{2}$ norm, ${Z}^{*}$ is the desired pattern, and ${\Vert \mathrm{\Gamma}\{M\}-{Z}^{*}\Vert}_{2}^{2}$ is usually called pattern error, which is often employed as a metric to evaluate the difference between the output pattern of the input mask and the desired pattern. However, pattern error is a dimensionless quantity and highly depends on mask feature and simulation grid size. Therefore, pattern error is inappropriate to act as a metric, especially when the simulation grid size is constantly changed. Substituting Eq. (13) into Eq. (12) and then into Eq. (11), we have the expression of EDE as## (14)

$$\mathrm{EDE}(M)=\frac{{\delta}_{x}\xb7{\delta}_{y}}{L}{\Vert \mathrm{\Gamma}\{M\}-{Z}^{*}\Vert}_{2}^{2}.$$It is noted that EDE has the dimension of length and, thus, is independent of the simulation grid size. More detailed physical description of EDE can be found in Ref. 30.

In addition, in order to quantify the manufacturability of the synthesized mask, mask quadratic error metric ${R}_{\mathrm{Q}}(M)$ and mask complexity metric ${R}_{\mathrm{TV}}(M)$ are adopted.^{18}^{,}^{19} Since we focus on the binary mask in this paper, the mask quadratic error metric ${R}_{\mathrm{Q}}(M)$ and the mask complexity metric ${R}_{\mathrm{TV}}(M)$ are expressed, respectively, as

## (16)

$${R}_{\mathrm{TV}}(\mathrm{M})={\int}_{\mathrm{\Omega}}(|\nabla {M}_{x}|+|\nabla {M}_{y}|)\mathrm{d}\mathbf{r},$$## 3.2.

### Formulation of Robust Inverse Lithography

From Sec. 3.1, there are, overall, three evaluation metrics that need to be optimized, i.e., the EDE, the mask quadratic error metric, and the mask complexity metric. In the literature, they are usually combined with certain proportions ${\beta}_{1}$ and ${\beta}_{2}$ to be stated as one single optimization objective as

Here, $F\{M\}$ is also called cost function. Therefore, the inverse lithography problem under the nominal condition is defined as

However, the formulation Eq. (18) of inverse lithography problem is under the assumption of an ideal imaging system without any process variations. However, in practice, process variations are significant and should not be ignored. In this paper, we take into account the two most important variations in lithography process, i.e., lens-wafer defocus and exposure dose variation, and employ a statistical strategy to optimize the average wafer performance via minimizing the expectation of the difference between the desired pattern and the output pattern with respect to process fluctuations. Specifically, exposure dose variation can be accounted for by varying the resist threshold $t$ in Eq. (9); the effect of defocus $h$ on the image intensity distribution is effectively an even-type lens aberration, leading to a defocused PSF function $H(\mathbf{r},h)$.^{32}

## (19)

$$H(\mathbf{r};h)=\mathrm{IFT}[\tilde{H}(\mathbf{f})\xb7\mathrm{exp}(j\pi h\frac{{\mathrm{NA}}^{2}}{\lambda}{\mathbf{f}}^{2}\left)\right],$$## (20)

$${\mathrm{EDE}}_{\mathrm{s}}(M)=\sum _{m}^{{N}_{h}}\sum _{n}^{{N}_{t}}[\alpha ({h}_{m})\xb7\zeta ({t}_{n})\xb7\mathrm{EDE}(M;{h}_{m},{t}_{n})],$$Now, the cost function in the robust inverse lithography is expressed as

## (21)

$$G(M)={\mathrm{EDE}}_{\mathrm{s}}(M)+{\beta}_{1}{R}_{\mathrm{Q}}(M)+{\beta}_{2}{R}_{\mathrm{TV}}(M).$$The robust inverse lithography problem is, thus, defined as

## 3.3.

### Cascadic Multigrid Algorithm

In this section, we develop a CMG algorithm to solve the robust inverse lithography problem. For the setup of CMG algorithm, we define a sequence of spaces

with the corresponding space grid sizeSince the grid is usually square, and we use only a parameter $\delta $ to describe the space grid size for simplicity, that means $\delta ={\delta}_{x}={\delta}_{y}$. ${\mathrm{\Omega}}_{C}$ denotes the finest grid space currently used for discretization of a mask. First, we solve the robust inverse lithography problem, i.e., Eq. (22), on the coarsest space ${\mathrm{\Omega}}_{1}$ and obtain a mask ${M}_{1}^{*}$. The obtained mask ${M}_{1}^{*}$ on the space ${\mathrm{\Omega}}_{1}$ is then embedded in the next space ${\mathrm{\Omega}}_{2}$ via an interpolation operation, and then the interpolated mask ${M}_{2}^{0}$ servers as the initial value on the space ${\mathrm{\Omega}}_{2}$. With the initial value ${M}_{2}^{0}$, it calculates a mask ${M}_{2}^{*}$ on the space ${\mathrm{\Omega}}_{2}$. The computation repeats in this manner until a mask ${M}_{C}^{*}$ on the finest space ${\mathrm{\Omega}}_{C}$ is obtained. The pseudo-code of CMG algorithm is described in Table 1.

## Table 1

The pseudo-code of cascadic multigrid (CMG) algorithm.

The procedure: |

Setup: Set C spaces {Ω,1Ω2,…,ΩC}, and initial mask M10=Z*,M10∈Ω1. |

Fork=1,2,…,Cdo |

1: Solve Mk*=argminMG(M),M∈Ωk with an initial value Mk0; |

2: Prolong to next space Ωk+1: Mk*→Mk+10; |

3: k+1→k. |

End For |

Output:MC* is the optimal mask. |

The CMG algorithm starts mask synthesis on the coarsest space ${\mathrm{\Omega}}_{1}$. Since the coarsest mask contains much less optimization variables (grids) than the finest mask on space ${\mathrm{\Omega}}_{C}$, it spends much less runtime to find a solution and is more likely to change each variable. From the perspective of signal processing, the coarse mask can be considered as the low-frequency portions of mask. The low-frequency portions are first optimized and then further corrected by adding high-frequency details on the refined spaces. By contrast, conventional methods perform mask synthesis directly on the finest space ${\mathrm{\Omega}}_{C}$, where it is expensive to change each variable and impossible to separate and treat efficiently the low-frequency information content of the mask. Therefore, the CMG algorithm can achieve a mask pattern by taking less runtime with better solution performance, compared to the conventional methods. This is the fundamental rationale behind the CMG algorithm, and simulation results below demonstrate its advantages.

## 3.4.

### Conjugate Gradient Method

In this section, we present a solution procedure of solving Eq. (22) on a certain space with a conjugate gradient (CG) method. First of all, we calculate the gradient of the cost function $G(M)$ with respect to mask $M$ as

## (25)

$$\nabla G(M)=\nabla {\mathrm{EDE}}_{\mathrm{s}}(M)+{\beta}_{1}\nabla {R}_{\mathrm{Q}}(M)+{\beta}_{2}\nabla {R}_{\mathrm{TV}}(M).$$Detailed expression of Eq. (25) is given in the Appendix. In CG method, the optimization direction ${v}^{k}$ in the $k$’th iteration is defined by

## (26)

$${v}^{k}=\{\begin{array}{ll}-{g}^{k}+{\eta}^{k}{v}^{k-1}& \mathrm{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k\ge 1\\ -{g}^{k}& \mathrm{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0\end{array}.$$Here, ${g}^{k}$ is a matrix representing the gradient $\nabla G(M)$ of cost function $G(M)$ at mask $M$, and ${\eta}^{k}$ is a factor that depends on different CG methods. When ${\eta}^{k}=0$, the evolution velocity is the negative gradient, and the CG method reduces to the steepest descent method. In this paper, we employ the Polak-Ribière-Polyak (PRP) CG method.^{28} In practical computations, the PRP CG method is generally believed to be one of the most efficient CG methods. It makes use of the velocity information in the previous iteration and can automatically adjust ${\eta}^{k}$ to avoid jamming. It essentially performs a restart when a bad direction occurs. In PRP CG method, ${\eta}^{k}$ is defined as

## (27)

$${\eta}^{k}=\frac{{\Vert {g}^{k}\Vert}_{2}^{2}-\sum _{}^{}{g}^{k}\xb7{g}^{k-1}}{{\Vert {g}^{k-1}\Vert}_{2}^{2}}.$$Now, the optimization procedure is described as follows:

**Iteration 0:**Since the mask value is bound-constrained to [0, 1], we use the following parametric transformation:Then given a desired output pattern ${Z}^{*}$($\mathbf{r}$), we compute the initial value ${\theta}^{0}$.

where ${\kappa}_{1}$ and ${\kappa}_{2}$ are parameters to adjust the initial mask value; for example, ${\kappa}_{1}=0.90$ and ${\kappa}_{2}=0.05$ in this paper. We do that because $M(i,j)=0$ or 1 would degrade the gradient of location ($i,j$) to 0, and therefore, the optimization freedom would be reduced.^{25}^{,}^{30}Finally, we calculate the initial gradient ${g}^{0}$ and the optimization direction ${v}^{0}$ based on Eq. (26). where**Iteration**$k$:**Step i:**Search the step length ${\gamma}^{k}$ in the direction ${v}^{k}$,**Step ii:**Update ${\theta}^{k+1}$,**Step iii:**Calculate the optimization direction for the next iteration.## (36)

$${g}^{k+1}={\nabla}_{{\theta}^{k+1}}G(M)=\frac{\partial G}{\partial {M}^{k+1}}\frac{\partial {M}^{k+1}}{\partial {\theta}^{k+1}},$$## (37)

$${\eta}^{k+1}=\frac{{\Vert {g}^{k+1}\Vert}_{2}^{2}-\sum _{}^{}{g}^{k+1}\xb7{g}^{k}}{{\Vert {g}^{k}\Vert}_{2}^{2}},$$**If**$\Vert {v}^{k+1}\Vert <\mathrm{\Lambda}$ or $k>\mathrm{\Psi}$, go to**Stop**.**Else**, return to**Step i**.

In the above procedure, the iteration is terminated when $\Vert {v}^{k+1}\Vert <\mathrm{\Lambda}$ or $k>\mathrm{\Psi}$, where $\mathrm{\Lambda}$ is defined as the minimum value of the norm of velocity and $\mathrm{\Psi}$ is the prescribed upper limit of the number of iterations. The termination criterion $\Vert {v}^{k+1}\Vert <\mathrm{\Lambda}$ means that the iteration stops when the gradient is zero or rather small.

## 4.

## Simulations

Simulations were performed on a partially coherent imaging system with a quasar source illumination (${\sigma}_{\mathrm{out}}/{\sigma}_{\text{in}}/\text{degree}=\phantom{\rule{0ex}{0ex}}0.9/0.6/45\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{deg}$). The wavelength $\lambda $ in the simulations was set at 193 nm and NA was 1.35. The resist effect was approximated by a Sigmoid function with $a=100$ and $t=0.4$. The mask quadratic error regularization factor ${\beta}_{1}$ and the mask complexity regularization factor ${\beta}_{2}$ were both set at 0.01. The parameter ${\kappa}_{1}$ and ${\kappa}_{2}$ of the initial mask in Eq. (29) were 0.90 and 0.05, respectively. The value of $\mathrm{\Lambda}$ was set at 0.3 in termination criterion $\Vert {v}^{k+1}\Vert <\mathrm{\Lambda}$. Instead of computing the step length ${\gamma}^{k}$ in Eq. (34) accurately, we directly set ${\gamma}^{k}$ at a small constant 0.3. All the simulations were carried out with in-house MATLAB® codes on an HPZ820 Workstation ($2\times 2$ GHz Intel Xeons with 8Cores/16Threads) using a Windows 7 (64 bit) operating system.

## 4.1.

### Results of Robust Inverse Lithography

Figure 3 depicts the synthesized mask patterns by nominal and robust ILT. Figure 3(a) shows the desired pattern consisting of $361\times 361$ pixels with a grid size of 2.5 nm, which is commonly encountered in logic circuit design. From Fig. 3, the synthesized mask patterns by robust ILT are quite different from that by nominal ILT. It is particularly interesting to note that subresolution assist features (SRAFs) are generated in Fig. 3(c), but not in Fig. 3(b). The reason might be that defocus from the nominal condition results in much worsened image contrast; thus SRAFs tend to be produced in order to enhance the image quality.

Moreover, we compare the exposure-defocus (E-D) trees of the synthesized mask patterns by setting the EDE to within $\pm 10\%$ of the CD target (45 nm in this case) in Fig. 4. As expected, the synthesized mask patterns by robust ILT have wider E-D windows. This is because the robust ILT further takes the process variations into consideration. As shown in Figs. 3 and 4, the robust ILT tends to generate SRAFs and has the capability of achieving wider E-D windows compared to that synthesized for the nominal condition only.

## 4.2.

### Results of Cascadic Multigrid Algorithm

In this case, we set three level spaces in total, ${\mathrm{\Omega}}_{1}$, ${\mathrm{\Omega}}_{2}$, and ${\mathrm{\Omega}}_{3}$, with the corresponding grid size of 15, 5, and 2.5 nm. A bilinear method is used to interpolate from a coarse grid space to a finer one. Figure 5 depicts the simulated images of the CMG algorithm. Since the grid size on the space ${\mathrm{\Omega}}_{1}$ is very coarse, the synthesized mask pattern, Fig. 5(b), on the space ${\mathrm{\Omega}}_{1}$ reveals a general pattern and consists of numerous gray-level SRAFs. These general pattern and gray-level SRAFs are difficult to manufacture in practice and require further correction on the fine spaces. Compared to the mask pattern synthesized by the conventional method [Fig. 3(c)], the mask pattern synthesized by CMG algorithm [Fig. 5(f)] is mostly the same but possesses more SRAFs.

Figure 6(a) illustrates the convergence comparison for the statistical EDE. The CMG algorithm takes 132 iterations to find an optimal mask pattern, Fig. 5(b), with a statistical EDE of 4.20 nm, and this mask pattern is then interpolated to serve as an initial value on the space ${\mathrm{\Omega}}_{2}$, as shown in Fig. 5(c). It is noted that the convergence history jumps to 5.34 nm here [shown in the blue circle in Fig. 6(a)]; this is because the synthesized coarse mask pattern [Fig. 5(b)] is not exactly accurate and, thus, required further correction. The CMG algorithm takes further 87 iterations on space ${\mathrm{\Omega}}_{2}$ and reaches a mask pattern, Fig. 5(d), with a statistical EDE of 4.44 nm. Finally, the CMG algorithm achieves a mask pattern, Fig. 5(f), after 20 iterations on the finest space ${\mathrm{\Omega}}_{3}$. From Fig. 6(a), it is observed that the CMG algorithm achieves a smaller statistical EDE of 4.46 nm compared to 4.51 nm by the conventional method, which directly synthesizes mask on the finest space ${\mathrm{\Omega}}_{3}$. Moreover, we also depict the convergence of pattern error in the CMG algorithm. It is observed that pattern error changed significantly when mask is interpolated from one grid space to another, and it is, thus, not intuitive to study on the convergence. Since pattern error is a dimensionless quantity and highly depends on mask feature and simulation parameters (such as simulation grid resolution), it is not suited for the CMG algorithm.

The runtime of the conventional method and the CMG algorithm on each space is shown in Table 2. Since the optimization variables (grids) are relatively less on the space ${\mathrm{\Omega}}_{1}$, it synthesizes an optimal mask pattern, Fig. 5(b), after 8.67 s compared to 620.03 s by directly synthesizing mask on the finest space ${\mathrm{\Omega}}_{3}$. Overall, the CMG algorithm takes 239 iterations using 135.35 s compared to 193 iterations using 620.03 s by the conventional method; in other words, the CMG algorithm is $4.6\times $ faster than the conventional method although taking more iterations. Moreover, we compare the E-D windows for the synthesized mask pattern by the conventional method and the CMG algorithm in Fig. 6(b). It is observed that the synthesized mask pattern by the CMG algorithm has slightly improved the E-D window. The reason might be that CMG algorithm generates more SRAFs.

## Table 2

Runtime of the conventional method and CMG algorithm on each space for a desired pattern Fig. 5(a).

CMG algorithm | Conventional method | ||||
---|---|---|---|---|---|

15 nm | 5 nm | 2.5 nm | Total | ||

Iteration num. | 132 | 87 | 20 | 239 | 193 |

Time (s) | 8.67 | 62.41 | 64.27 | 135.35 | 620.03 |

Since there are rather less optimization variables $61\times 61$ grids on the space ${\mathrm{\Omega}}_{1}$ than $361\times 361$ grids on the space ${\mathrm{\Omega}}_{3}$, it takes much less runtime and is more likely to change each variable to find a better solution. As expected, the synthesized result on the coarsest space ${\mathrm{\Omega}}_{1}$ contains a general pattern with numerous SRAFs. From the perspective of signal processing, these patterns can be considered low-frequency portions in mask, and these patterns are further corrected by adding high-frequency details on the fine space. The conventional method accomplishes mask synthesis directly on the fine space ${\mathrm{\Omega}}_{3}$; it, therefore, is expensive to change each variable and cannot handle the low-frequency portions in mask explicitly. As a result, the CMG algorithm has the capacity of achieving a mask pattern with smaller statistical EDE and wider E-D window compared to the conventional method.

We also performed simulations for a more complicated pattern in Fig. 7. Figure 7(a) shows the desired pattern, which is a metal layer of the benchmark clock gate circuit pattern within a simulation area of $2\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}\times 3.3\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \mathrm{m}$.^{44} As expected, the synthesized mask pattern, Fig. 7(b), by nominal ILT is quite different from that synthesized by robust ILT, Figs. 7(c) and 7(d), and the robust ILT has significantly improved the E-D window (see Fig. 8). On the other hand, compared to the synthesized mask pattern by the conventional method, CMG algorithm reaches a mostly same pattern but with more SRAFs; the E-D window has, thus, slightly improved. The runtime of the conventional method and the CMG algorithm on each space is shown in Table 3. It is revealed that the CMG algorithm is overall $4.3\times $ faster than the conventional method. This set of simulations further demonstrates that the CMG algorithm has big improvement in computation efficiency, along with a slightly better lithographic performance (wider E-D window).

## Table 3

Runtime of the conventional method and CMG algorithm on each space for a desired pattern Fig. 7(a).

CMG algorithm | Conventional method | ||||
---|---|---|---|---|---|

15 nm | 5 nm | 2.5 nm | Total | ||

Iteration num. | 154 | 106 | 22 | 282 | 247 |

Time (s) | 97.03 | 735.64 | 628.79 | 1.48×103 | 6.37×103 |

## 5.

## Conclusions

In this paper, we report a CMG method for robust inverse mask synthesis. This method synthesizes mask hierarchically. It first computes an initial mask pattern on a coarse grid and, therefore, spends less runtime and is more likely to change each variable to find a better solution. This initial mask is then interpolated to a fine grid and to be further corrected, which requires a small number of iterations on the fine grid. Overall, the CMG algorithm is able to run more than four times faster than conventional mask synthesis methods working on a fixed fine grid. In addition, we apply an analytical circle-sampling technique to the forward lithography imaging modeling and employ a novel EDE metric to guide mask synthesis. These two techniques are both independent of grid size and are well suited for the CMG algorithm. It is expected that the proposed method will provide a useful mask synthesis strategy in practical optical lithography.

## Acknowledgments

This work was funded by the National Natural Science Foundation of China (Grant No. 91023032, 51005091, 51121002), the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120142110019), the National Science and Technology Major Project of China (Grant No. 2012ZX02701001), and the National Instrument Development Specific Project of China (Grant No. 2011YQ160002. It was also supported in part by the UGC Areas of Excellence project Theory, Modeling, and Simulation of Emerging Electronics, and by the State Key Lab of Digital Manufacturing Equipment and Technology under Project DMETKF2013003.

## Appendices

## Appendix:

### Derivation of Eq. (25)

The gradient of the cost function $G(M)$ with respect to mask $M$ is expressed in Eq. (25). In the following, we present the detailed expression of each term in Eq. (25), i.e., ${\nabla}_{M}{\mathrm{EDE}}_{\mathrm{s}}(M)$, ${\nabla}_{M}{R}_{\mathrm{Q}}(M)$, and ${\nabla}_{M}{R}_{\mathrm{TV}}(M)$. In the present content, $\mathbf{r}$ and $\rho $ denote the spatial coordinate ($x,y$).

According to Ref. 28, with a fixed defocus ${h}_{m}$ and dose variation ${t}_{n}$, the gradient of EDE with respect to mask $M$ is

## (40)

$${\nabla}_{M}\mathrm{EDE}(M;{h}_{m},{t}_{n})=\frac{\partial \mathrm{EDE}(M;{h}_{m},{t}_{n})}{\partial M}=a\xb7\frac{{\delta}_{x}\xb7{\delta}_{y}}{L}\phantom{\rule{0ex}{0ex}}\{\sum _{i=1}^{Q}{\mu}_{i}{\varphi}_{i}^{\mathrm{flip}}(\mathbf{r};{h}_{m})\otimes [(Z-{Z}^{*})\xb7Z\phantom{\rule{0ex}{0ex}}\xb7(1-Z)\xb7({[{\varphi}_{i}(\mathbf{r};{h}_{m})]}^{\u2020}\otimes M)]\}\phantom{\rule{0ex}{0ex}}+a\xb7\frac{{\delta}_{x}\xb7{\delta}_{y}}{L}\{\sum _{i=1}^{Q}{[{\mu}_{i}{\varphi}_{i}^{\mathrm{flip}}(\mathbf{r};{h}_{m})]}^{\u2020}\phantom{\rule{0ex}{0ex}}\otimes [(Z-{Z}^{*})\xb7Z\xb7(1-Z)\phantom{\rule{0ex}{0ex}}\xb7({\varphi}_{i}(\mathbf{r};{h}_{m})\otimes M)]\},$$## (41)

$${\nabla}_{M}{\mathrm{EDE}}_{\mathrm{s}}(M)=\frac{\partial \sum _{m}^{{N}_{h}}\sum _{n}^{{N}_{t}}[\alpha ({h}_{m})\xb7\zeta ({t}_{n})\xb7\mathrm{EDE}(M;{h}_{m},{t}_{n})]}{\partial M}\phantom{\rule{0ex}{0ex}}=\sum _{m}^{{N}_{h}}\sum _{n}^{{N}_{t}}[\alpha ({h}_{m})\xb7\zeta ({t}_{n})\xb7\frac{\partial \mathrm{EDE}(M;{h}_{m},{t}_{n})}{\partial M}]\phantom{\rule{0ex}{0ex}}=\sum _{m}^{{N}_{h}}\sum _{n}^{{N}_{t}}[\alpha ({h}_{m})\xb7\zeta ({t}_{n})\xb7{\nabla}_{M}\mathrm{EDE}(M;{h}_{m},{t}_{n})].$$The gradient of the mask quadratic error ${R}_{\mathrm{Q}}(M)$ with respect to mask $M$ is

## (42)

$${\nabla}_{M}{R}_{\mathrm{Q}}(M)=\frac{\partial {\int}_{\mathrm{\Omega}}\{1-{[2M(\mathbf{r})-1]}^{2}\}\mathrm{d}\mathbf{r}}{\partial M(\rho )}\phantom{\rule{0ex}{0ex}}=\frac{\partial \sum _{\mathbf{r}}\{-4{M}^{2}(\mathbf{r})+4M(\mathbf{r})\}}{\partial M(\rho )}=-8M+4.$$The gradient of the mask complexity ${R}_{\mathrm{TV}}(M)$ with respect to mask $M$ is

## (43)

$${\nabla}_{M}{R}_{\mathrm{TV}}(M)=\frac{\partial {\int}_{\mathrm{\Omega}}(|\nabla {M}_{x}|+|\nabla {M}_{y}|)\mathrm{d}\mathbf{r}}{\partial M(\rho )}\phantom{\rule{0ex}{0ex}}=\frac{\partial \sum _{\mathbf{r}}[|YM(\mathbf{r})|+|M(\mathbf{r}){Y}^{T}|]}{\partial M(\rho )}\phantom{\rule{0ex}{0ex}}={Y}^{T}\mathrm{sign}[YM]+\mathrm{sign}[M{Y}^{T}]Y,$$## References

## Biography

**Wen Lv** is currently a PhD candidate at Huazhong University of Science and Technology under the guidance of Professor Shiyuan Liu. He received his BS degree from the School of Mechanical Science and Engineering of the same university in 2011. His research involves various issues in optical lithography, including inverse lithography, fast optical image simulation, and mask writing technique. He is a student member of SPIE and IEEE.

**Edmund Y. Lam** received his BS (with distinction), MS, and PhD degrees in electrical engineering from Stanford University. He was a senior engineer at KLA-Tencor Corporation in San Jose, California, before joining the Electrical and Electronic Engineering Department at the University of Hong Kong, where he is currently an associate professor. He is also the founding director of its Imaging Systems Laboratory. His research interests include computational optics and imaging, particularly their applications in the semiconductor manufacturing process. He is a fellow of SPIE and OSA.

**Haiqing Wei** received his BS in physics from Peking University, China, MS in physics, and PhD in electrical and computer engineering from McGill University, Canada. He has been a research scientist and engineer for over a decade, working in various subjects of fiber-optic communications, optical imaging, nanophotonics, and computational lithography. He has over 20 patent applications in these subject areas and has published over 30 scientific articles in peer-reviewed journals. Since April 2012, he has been appointed as a guest professor at Huazhong University of Science and Technology.

**Shiyuan Liu** is a professor of mechanical engineering at Huazhong University of Science and Technology, leading his Nanoscale and Optical Metrology Group with research interest in metrology and instrumentation for nanomanufacturing. He also actively works in the area of optical lithography, including partially coherent imaging theory, wavefront aberration metrology, optical proximity correction, source mask optimization, and inverse lithography technology. He received his PhD in mechanical engineering from Huazhong University of Science and Technology in 1998. He is a member of SPIE, OSA, AVS, IEEE, and Chinese Society of Micro/Nano Technology. He holds 21 patents and has authored or coauthored more than 100 technical papers.