## 1.

## Introduction

With ever-shrinking feature size, the physical characteristics of optics have stronger impacts on the imaging system. In particular, the band-limit system causes the output pattern to be a warped version of the input mask.^{1} Several resolution enhancement techniques (RETs) have been developed to improve the performance of optical lithography.^{1}2.^{–}^{3} Optical proximity correction (OPC) is one of these RETs.^{4} Its objective is synthesizing an input mask to deliver a desired output pattern. Inverse lithography technique (ILT), as an active approach to OPC, is considered as an economically viable way to meet various challenges in future technology nodes. The computational efficiency of ILT is most noteworthy, especially when handling a large-scale (or full-chip) optimization problem.

Generally, ILT treats the mask synthesis as an inverse mathematical problem that aims at minimizing a cost function for the difference between the output and desired patterns. Various computation techniques have been proposed to deal with this inverse problem in the literature, such as the level-set method,^{5}6.7.^{–}^{8} the discrete cosine transform (DCT)-based method,^{9} and the gradient-based method.^{10}11.12.13.14.^{–}^{15} The level-set method treats a mask as a sophisticated continuum,^{5}6.7.^{–}^{8} and consequently, the boundary of the mask is iteratively evolved according to an optimization algorithm. The DCT-based method transforms a mask to the frequency space using a two-dimensional DCT.^{9} The low frequency components of the mask are adopted, and the corresponding coefficients are iteratively changed in the optimization process. As a result, the synthesized mask only possesses low frequency components and is therefore less complex. The above two techniques can both result in a smooth mask contour, while they are both limited in searching for the whole solution space. The gradient-based method considers a mask as a raster image constituted by pixels directly, where it is synthesized pixel-by-pixel in an iterative direction of the steepest descent, conjugate gradient, and so on.^{8}^{,}^{10}11.12.13.14.^{–}^{15} It is probably the most popular technique in the literature due to its high flexibility, ease of understanding, and implementation.

Since the mask is discretized into pixels in ILT, such flexibility often causes the synthesized mask to be a gray-level image and it may possess small, unwanted block objects, such as isolated holes, protrusions, and jagged edges, which in turn are unreachable during the real manufacturing.^{10}11.12.13.14.^{–}^{15} To address these problems, regularization approaches are introduced to guarantee the synthesized mask to be binary and less complex. In the literature, almost all the regularization approaches take the regularization terms as penalty functions incorporated into a cost function with corresponding weighted parameters^{10}11.12.13.14.^{–}^{15} as

## (1)

$$F\{M\}={\Vert \mathrm{\Gamma}\{M\}-{Z}^{*}\Vert}_{2}^{2}+\sum _{i=1}^{n}{\lambda}_{i}{R}_{i}(M).$$Here, the operator $\mathrm{\Gamma}\{\xb7\}$ implements the forward mapping from the input mask $M$ to the output pattern, ${Z}^{*}$ is the desired output pattern, and ${R}_{i}(M)$ is the various regularization terms. Generally, regularization terms can be classified into two types: one is related to the manufacturability, such as the quadratic penalty term,^{11}^{,}^{12} the total variation penalty term,^{11}^{,}^{12} and the wavelet penalty term;^{13}^{,}^{15} the other type is related to the fabrication process, such as the image slope term^{14}^{,}^{16} and the mask error enhancement factor (MEEF).^{17}^{,}^{18} ${\lambda}_{i}$ is the weight of the corresponding regularization term ${R}_{i}(M)$. It should be noted that ${\lambda}_{i}$ plays a critical role in the optimization process; however, how and why values of ${\lambda}_{i}$ are chosen is rarely discussed in the literature. From experience, it is usually initially set to be a constant.^{10}11.12.13.14.^{–}^{15}

Here, we take the quadratic penalty term as an example to illustrate how the weighted parameter $\lambda $ impacts the optimization process. In this case, both the lower pattern error and the mask quadratic error are preferred, where the pattern error is calculated by ${\Vert \mathrm{\Gamma}\{M\}-{Z}^{*}\Vert}_{2}^{2}$ and the mask quadratic error is equal to the value of the quadratic penalty term $R(M)$. As shown in Fig. 1, a smaller weighted parameter $\lambda $ of this quadratic penalty term results in a rapid convergence on pattern error. It is observed from Fig. 1(a) that the pattern error may meet the requirement after 44 iterations where the mask quadratic error is pretty high at this moment; but it still needs extra iterations to reduce the mask quadratic error. On the other hand, a larger $\lambda $ results in good performance on the mask quadratic error while causing poor convergence on the pattern error, as Fig. 1(d) reveals. As shown in Fig. 1, the convergence of the pattern error and the mask quadratic error under such a regularization framework is out of synchronization. Notice that a smaller $\lambda $ will not achieve the regularization effects, whereas a larger $\lambda $ may result in a larger pattern error; it is, therefore, difficult to choose an appropriate constant value of $\lambda $. Moreover, the choice of $\lambda $ has a close relation with the mask features and the simulation resolution. In mathematics, a solid approach is that the $\lambda $ is adaptive with each iteration. However, this, in turn, increases the freedom of design variable, and it is generally difficult to accomplish.

Most recently, we propose an alternative regularization framework that regularizes mask directly by using a mask filtering technique.^{19} In such a framework, the original cost function Eq. (1) is changed to Eq. (2) as

^{20}

^{,}

^{21}In this article, gray-level transitions and small, unwanted block objects in the mask are all interpreted as unwanted noise, and it is therefore natural to use filters to remove or prevent this noise in order to satisfy the manufacturing constraints. Section 2.4 details this mask filtering technique.

Moreover, we introduce a metric called edge distance error (EDE) to guide mask synthesis in the ILT framework and establish the correlation between pattern error and edge placement error (EPE) via EDE. EPE is popularly used in the polygon-based OPC to convey critical dimension (CD) information, which is essentially the CD error at one side.^{4} However, it is seldom used in an ILT framework due to its discrete form. One reason is that the gradient (or sensitivity) of EPE with respect to the mask (calculated by a numerical differentiation method) has a computational complexity of $O({K}^{2})$, where $K$ is the total number of mask pixels in the simulation area and is significantly slower than an analytical gradient calculation with the computational complexity of $O[K\mathrm{log}(K)]$.^{10}^{,}^{11} Therefore, pattern error instead of EPE is applied in an ILT framework for its continuous expression and high computational efficiency.^{6}7.^{–}^{8}^{,}^{10}11.12.13.14.^{–}^{15} The pattern error employs an approximated and continuous resist model, and it is defined as a square of the ${L}_{2}$ norm of the difference between the output pattern of the input mask and the desired feature, which causes pattern error to be continuous and differentiable with respect to the input mask explicitly.^{10}^{,}^{11} However, pattern error is a dimensionless quantity and highly depends on mask feature and simulation parameters, such as simulation area and simulation resolution. For this reason, pattern error is not popular in the industry. In this paper, we, therefore, introduce the metric EDE, which has the same dimension as EPE and has a continuous expression as pattern error. The detailed description of EDE will be given in Sec. 2.2.

In addition, with the CD decreasing, the printed dimension becomes increasingly sensitive to the fluctuation of the fabrication process, which limits the yield in the semiconductor industry. Instead of using process penalty terms, such as the image slope term and the MEEF, a statistical strategy is applied to minimize a cost function under different process variations weighted by their statistical probability to enhance the robustness of layout patterns.^{7}^{,}^{22}23.24.25.26.^{–}^{27} This method is directly related to the fabrication process and is well understood and easily accomplished, while using the process penalty terms can be considered as a roundabout regularization approach and requires deeper understanding of mask topology and the imaging system.

The remainder of this paper is organized as follows. Section 2 details the proposed mask filtering technique. Section 3 provides the simulation results to demonstrate the validity and efficiency of the proposed method. Finally, we draw some conclusions in Sec. 4.

## 2.

## Methodology

## 2.1.

### Lithography Imaging Model

In this section, we review the general lithography imaging model in ILT.^{10}^{,}^{15} Abstractly, the imaging process for optical lithography is mathematically described as

The projection optics effect, namely the optical image in resist $I(\mathbf{r})$, can be modeled as a pupil function with a partially coherent illumination source.^{28} This is called the partially coherent imaging system,^{29} which can be approximated by the sum of the coherent systems method,^{4} the optimal coherent approximation approach,^{30} or the analytical circle-sampling technique^{31} as the superposition of several coherent systems

## (4)

$$\mathit{I}(\mathbf{r})=\sum _{q=1}^{Q}{\mu}_{q}{|{h}_{q}(\mathbf{r})\otimes M(\mathbf{r})|}^{2}.$$Here, ${h}_{q}(\mathbf{r})$ is the $q$’th optical kernel, ${\mu}_{q}$ is the eigenvalue of the $q$’th kernel with $Q$ kernels in total, and $\otimes $ denotes the two-dimensional convolution. The resist effect can be approximated by a constant threshold resist model using the following logarithmic Sigmoid function:^{11}

Combining Eqs. (4) and (5), we can write the lithography imaging equation as

## 2.2.

### Edge Distance Error

Due to the low-pass nature of the optical imaging system, $Z(\mathbf{r})$ is typically a blurred version of $M(\mathbf{r})$. Generally, the ${L}_{2}$ norm is employed as a metric to evaluate the difference between the output pattern $M(\mathbf{r})$ and the desired pattern ${Z}^{*}(\mathbf{r})$ as

## (7)

$$F\{M(\mathbf{r})\}={\Vert \mathrm{\Gamma}\{M(\mathbf{r})\}-{Z}^{*}(\mathbf{r})\Vert}_{2}^{2}.$$Here, $F\{M\}$ is called pattern error or fidelity error. The only difference between pattern error and fidelity error is that fidelity error uses a Sigmoid function to characterize the resist effect, whereas pattern error uses a step function. The values of fidelity error and pattern error are almost the same since the steepness of the Sigmoid function $a$ is large enough. Therefore, in this paper, we would like to call $F\{M\}$ pattern error without distinguishing between them. It is noted that pattern error is a continuous function and hence, the gradient of $F\{M\}$ with respect to the mask can be analytically calculated. However, this metric is not intuitive, for its magnitude is not directly related to the CD error and strongly depends on the mask feature and simulation parameters, such as simulation grid size. In other words, different simulation parameters will result in a different pattern error although with the same pattern.

Therefore, we try to derive a metric from pattern error and explicitly relate it to the commonly used EPE in industry. This metric EDE should convey CD information and be independent of the mask feature and simulation parameters. Figure 2(a) depicts the pixel-based representation of a mask pattern and its output pattern on the wafer, where the red dots are discrete sampling elements (pixels) of the patterns, ${S}_{\text{shadow}}$ denotes the absolute difference area between the desired pattern contour and the output pattern contour, and $L$ is the perimeter of the desired pattern contour. EDE is defined as

This means that EDE has the dimension of length and thus has an intuitive physical meaning.

Assuming the grid size is small enough in Fig. 2(a), the absolute difference area ${S}_{\text{shadow}}$ can be approximated by multiplying the total number of elements in shadow and the element area as

Here, $N$ is the total number of red dots (elements) in shadow, and ${\delta}_{x}$ and ${\delta}_{y}$ are the lengths of the element along the $x$ and $y$ directions, respectively, as shown in Fig. 2(a). Since the value of the element in the output pattern is either 0 or 1, according to the definition of pattern error in Eq. (7), the number $N$ is approximately equal to the pattern error, namely, $N=F\{M\}$. So, the absolute difference area ${S}_{\text{shadow}}$ can be expressed as

## (10)

$${S}_{\text{shadow}}=N\xb7({\delta}_{x}\xb7{\delta}_{y})=F\{M\}\xb7({\delta}_{x}\xb7{\delta}_{y}).$$Substituting Eq. (10) into Eq. (8), we have the expression of EDE as

## (11)

$$\mathrm{EDE}(M)=\frac{{S}_{\text{shadow}}}{L}=\frac{({\delta}_{x}\xb7{\delta}_{y})}{L}\xb7F\{M\}.$$It is noted that Eq. (11) directly relates EDE to pattern error $F\{M\}$. The portion of $({\delta}_{x}\xb7{\delta}_{y})/L$ is a constant related to the simulation resolution and desired pattern, which makes pattern error have a dimension of length as EPE. This means that EDE is continuous as pattern error, and the computational complexity of EDE is the same as pattern error, i.e., $O(N)$, where $N$ is the total number of elements in shadow as shown in Fig. 2(a).

Alternatively, the absolute difference area ${S}_{\text{shadow}}$ can be formulated as an integral of EPE taken along the closed desired pattern contour curve $C$:

where $p$ denotes an infinite small segment on the desired pattern contour curve and $\mathrm{d}\ell $ is the corresponding segment length. When the pattern contour curve is discretized into a finite number of segments, the pattern is represented as multiple polygons, and this representation is popularly used in polygon-based OPC. In this case, as shown in Fig. 2(b), the absolute difference area ${S}_{\text{shadow}}$ can be approximated as## (13)

$${S}_{\text{shadow}}={\oint}_{C}\mathrm{EPE}(p)\mathrm{d}\ell =\sum _{i}\mathrm{EPE}({p}_{i}){l}_{i}.$$Here, ${p}_{i}$ is the $i$’th segment, and ${l}_{i}$ is the corresponding length of the segment ${p}_{i}$. Substituting Eq. (13) into Eq. (8), EDE can be alternatively expressed as

## (14)

$$\mathrm{EDE}(M)=\frac{{S}_{\text{shadow}}}{L}=\frac{1}{L}\sum _{i}\mathrm{EPE}({p}_{i}){l}_{i}.$$Therefore, EDE may be interpreted as the mean EPE. Equations (11) and (14) establish the correlation between pattern error and EPE, and these two metrics are actually equivalent in a sense via EDE. Either of the pattern error, EPE or EDE, can act as a metric (or cost function) to guide mask synthesis. However, since EDE has the same dimension as EPE and has a continuous expression as a pattern error, it outperforms the other two.

Furthermore, EDE can convey the local CD information and can be weighted by adding some metrology windows. Considering a practical case, customers are sometimes only concerned about some special locations (hotspots) in the resist. In this case, we add a window function around the hotspots as shown in Fig. 3. The value inside the metrology windows is usually set at 1 and that outside at 0. The weighted (or local) area ${S}_{\text{shadow}}$ is expressed as

Here, ${N}_{w}$ is the total number of elements in shadow as shown in Fig. 3 and is approximately equal to the weighted pattern error ${F}_{w}\{M\}$ as

## (16)

$${N}_{w}={F}_{w}\{M\}={\Vert \sqrt{w(\mathbf{r})}\xb7\{\mathrm{\Gamma}\{M\}-{Z}^{*}\}\Vert}_{2}^{2},$$## (17)

$${\mathrm{EDE}}_{w}(M)=\frac{({\delta}_{x}\xb7{\delta}_{y})}{{L}_{w}}\xb7{\Vert \sqrt{w(\mathbf{r})}\xb7\{\mathrm{\Gamma}\{M\}-{Z}^{*}\}\Vert}_{2}^{2},$$For simplicity, $G(M)$ is used to represent the weighted EDE. Generally, $G(M)$ is treated as a cost function to guide mask synthesis under nominal conditions, i.e., no defocus and dosage variations, etc. In order to enhance the process robustness, process variations should be taken into account under the mask synthesizing process. Here, we use the expectation of the weighted EDE under different variations as a cost function as expressed by

where $\zeta $ denotes the expectation operation over $\mathbf{v}$, $\mathbf{v}$ is a vector representing a combination of multiple process variations including, for example, defocus, exposure dosage variation, and lens aberrations, etc., and $\psi (\mathbf{v})$ is the statistical probability of the corresponding process variations, which is defined by users and is usually obtained via various experiments or measurements of lithographic tools. $J(M)$ is called the statistical EDE and is used as a cost function to guide the mask synthesis. The gradient of $J(M)$ with respect to mask $M$ will be used in the optimization process. According to Refs. (8, 13, and 15), the gradient of $J(M)$ with respect to mask $M$ is given as## (19)

$${\nabla}_{M}J=\sum _{\mathbf{v}}a\xb7\frac{({\delta}_{x}\xb7{\delta}_{y})}{{L}_{w}}\xb7\psi (\mathbf{v})\xb7\{\sum _{q=1}^{Q}{\mu}_{q}{h}_{q}^{\text{flip}}(\mathbf{r};\mathbf{v})\phantom{\rule{0ex}{0ex}}\otimes [w\xb7(Z-{Z}^{*})\xb7Z\xb7(1-Z)\xb7({h}_{q}^{\u2020}(\mathbf{r};\mathbf{v})\otimes M)]\}\phantom{\rule{0ex}{0ex}}+\sum _{\mathbf{v}}a\xb7\frac{({\delta}_{x}\xb7{\delta}_{y})}{{L}_{w}}\xb7\psi (\mathbf{v})\xb7\{\sum _{q=1}^{Q}{[{\mu}_{q}{h}_{q}^{\text{flip}}(\mathbf{r};\mathbf{v})]}^{\u2020}\phantom{\rule{0ex}{0ex}}\otimes [w\xb7(Z-{Z}^{*})\xb7Z\xb7(1-Z)\xb7({h}_{q}(\mathbf{r};\mathbf{v})\otimes M)]\},\phantom{\rule{0ex}{0ex}}$$## 2.3.

### Inverse Lithography Problem Definition and Regularization

The objective of inverse lithography is synthesizing an input mask to deliver a desired output pattern. In order to guarantee the manufacturability of synthesized mask, mask quadratic error and complexity should be considered. The quadratic metric ${R}_{Q}(M)$ and the complexity metric ${R}_{\mathrm{TV}}(M)$, i.e., total variation, are usually adopted to quantify the corresponding performance. In this paper, we focus on the binary mask. So, the quadratic metric ${R}_{Q}(M)$ and the complexity metric ${R}_{\mathrm{TV}}(M)$ are expressed, respectively,^{11}^{,}^{12} as

## (21)

$${R}_{\mathrm{TV}}(M)={\Vert \frac{\partial M}{\partial x}\Vert}_{1}+{\Vert \frac{\partial M}{\partial y}\Vert}_{1}={\Vert DM\Vert}_{1}+{\Vert M{D}^{T}\Vert}_{1},$$## (22)

$$D=\left[\begin{array}{ccccc}1& -1& & & 0\\ & 1& -1& & \\ & & \ddots & \ddots & \\ & & & 1& -1\\ 0& & & -1& 1\end{array}\right].$$Therefore, combining the optimization objectives of the mask quadratic error, the complexity, and the statistical EDE, we state the inverse lithography problem as

Finding ${M}^{*}(\mathbf{r})$ to minimize: $J(M)$, ${R}_{Q}(M)$ and ${R}_{\mathrm{TV}}(M)$

subject to: $0\le M\le 1$.

It is noted that this problem has three mutually exclusive minimization objectives. In the literature, they are usually combined with certain proportions ${\lambda}_{1}$ and ${\lambda}_{2}$ to be stated as a single-objective minimization problem:^{10}11.12.13.14.^{–}^{15}

## 2.4.

### Mask Filtering Method

In this section, we propose an alternative method to solve this multiobjective minimization problem. We first interpret gray-level transitions and small, unwanted block objects, such as isolated holes, protrusions, jagged edges, or other layouts that cannot be fabricated, as unwanted noise in the mask, and then we design a specific filter $S[\xb7]$ to remove or prevent this noise to satisfy manufacturing constraints

After the filtering process, the quadratic metric ${R}_{Q}(M)$ and complexity metric ${R}_{\mathrm{TV}}(M)$ of the filtered mask $\tilde{M}$ are rather small. Then, we calculate the cost function of this filtered mask

Thus, the multiobjective minimization problem is converted into a simpler single-objective minimization problem as

We employ an iterative method to solve this problem. In the iteration process, we ensure that the statistical EDE, i.e., $J(S[M])$, of the filtered mask is iteratively decreasing. It is noted that each obtained mask is filtered and satisfies all the optimization objectives except for the statistical EDE; namely, it satisfies the manufacturing constraints. As a result, we only need to reduce the statistical EDE of this filtered mask. This approach is called the mask filtering technique.

The filter operator $S[\xb7]$ can be designed based on different mask manufacturing rules. The most basic filter should filter the gray-level image to be a district 0 or 1 and guarantee the mask to be less complex. We, therefore, define a basic mask filter as

Here, the steepness of this Sigmoid function is ${a}_{S}$ and the threshold is ${t}_{S}$. $O$ is a Gaussian filter to relieve mask complexity,

## (26)

$$O(\mathbf{r})={\tau}^{-1}\xb7{e}^{-(1/2){(\Vert \mathbf{r}-{\mathbf{r}}_{0}\Vert /{\sigma}_{O})}^{2}},$$## (27)

$$\tau ={\int}_{{\mathrm{\Omega}}_{1}}{e}^{-(1/2){(\Vert \mathbf{r}-{\mathbf{r}}_{0}\Vert /{\sigma}_{O})}^{2}}\mathrm{d}\mathbf{r},$$## (28)

$${\nabla}_{M}S[M]={a}_{S}\xb7{O}^{\text{flip}}\phantom{\rule{0ex}{0ex}}\otimes \{\mathrm{sig}[O\otimes M]\xb7[1-\mathrm{sig}(O\otimes M)]\}.$$The detailed derivation of Eq. (28) is given in the Appendix.

Combining Eqs. (19) and (28), the gradient of $J(S[M])$ with respect to $M$ is

## (29)

$${\nabla}_{M}J=\frac{\partial J}{\partial S}\frac{\partial S}{\partial M}={\nabla}_{S}J\xb7{\nabla}_{M}S.$$With the gradient Eq. (29), we apply a steepest descent method to solve this problem.^{11} The optimization procedure is

**Iteration 0:**Since the value of the mask is bound constrained to [0, 1], we use the following parametric transformation as## (30)

$$M=\frac{1+\mathrm{cos}(\mathrm{\Theta})}{2},\phantom{\rule[-0.0ex]{2em}{0.0ex}}\mathrm{\Theta}\in (-\infty ,\infty ).$$Then, given a desired output pattern ${Z}^{*}(\mathbf{r})$, we compute the initial input mask ${\mathrm{\Theta}}_{0}$

where ${\kappa}_{1}$ and ${\kappa}_{2}$ are parameters to adjust the initial value of the mask; for example, ${\kappa}_{1}=0.90$ and ${\kappa}_{2}=0.05$ in this paper. We do that because $M(i,j)=0$ or 1 would degrade the gradient of location ($i,j$) to 0 and therefore, the optimization freedom would be reduced.## (31)

$${M}_{0}(\mathbf{r})={\kappa}_{1}\xb7[H(\mathbf{r})\otimes {Z}^{*}(\mathbf{r})]+{\kappa}_{2},$$^{15}$H(\mathbf{r})$ is a Gaussian function to make the initial mask continuous so that the gradient with respect to the initial mask is smooth. $H(\mathbf{r})$ is defined aswhere ${\mathbf{r}}_{0}$ is the center point and $\Vert \mathbf{r}-{\mathbf{r}}_{0}\Vert $ means the distance from $\mathbf{r}$ to ${\mathbf{r}}_{0}$. ${\mathrm{\Omega}}_{2}$ is the number of pixels in $H(\mathbf{r})$ and $\eta $ is the normalized weight## (34)

$$H(\mathbf{r})={\eta}^{-1}\xb7{e}^{-(1/2){(\Vert \mathbf{r}-{\mathbf{r}}_{0}\Vert /{\sigma}_{H})}^{2}},$$## (35)

$$\eta ={\int}_{{\mathrm{\Omega}}_{2}}{e}^{-(1/2){(\Vert \mathbf{r}-{\mathbf{r}}_{0}\Vert /{\sigma}_{H})}^{2}}\mathrm{d}\mathbf{r}.$$Finally, we calculate the initial gradient

where**Iteration k**:**Step 1:**Search the step length ${\gamma}_{k}\in \mathbb{R}$ in the direction ${\nabla}_{{\mathrm{\Theta}}_{k}}J$,**Step 2:**Update ${\mathrm{\Theta}}_{k+1}$, ${M}_{k+1}$, and ${S}_{k+1}$**Step 3:**Calculate the gradient for the next iteration,## (42)

$${\nabla}_{{\mathrm{\Theta}}_{k+1}}J=\frac{\partial J}{\partial {S}_{k+1}}\frac{\partial {S}_{k+1}}{\partial {M}_{k+1}}\frac{\partial {M}_{k+1}}{\partial {\mathrm{\Theta}}_{k+1}}.$$**If**$\Vert {\nabla}_{{\mathrm{\Theta}}_{k+1}}J\Vert <\mathrm{\Lambda}$ or $\Vert J({\mathrm{\Theta}}_{k+1})\Vert <\mathrm{\Xi}$ or $k>\mathrm{\Psi}$, go to**Stop**.**Else**, return to**Step 1**.**Stop:**Obtain the optimized mask,

In the above procedure, the iteration is terminated when $\Vert {\nabla}_{{\mathrm{\Theta}}_{k+1}}J\Vert <\mathrm{\Lambda}$ or $\Vert \mathit{J}({\mathrm{\Theta}}_{k+1})\Vert <\mathrm{\Xi}$ or $k>\mathrm{\Psi}$, where $\mathrm{\Lambda}$ is defined as the minimum value of the norm of velocity, $\mathrm{\Xi}$ is defined as the minimum value of the statistical EDE, and $\mathrm{\Psi}$ is the prescribed upper limit of the number of iterations. The termination criterion $\Vert {\nabla}_{{\mathrm{\Theta}}_{k+1}}J\Vert <\mathrm{\Xi}$ means that the iteration stops when the gradient is zero or rather small.

## 3.

## Simulations

Simulations were performed on a partially coherent imaging system with an annular source illumination whose outer radius was ${\sigma}_{\text{out}}=0.7$ and whose inner radius was ${\sigma}_{\text{in}}=0.4$. The wavelength in the simulations was set at 193 nm, and the numerical aperture (NA) was 1.35. The resist effect was approximated by a Sigmoid function with $a=100$ and $t=\phantom{\rule{0ex}{0ex}}0.7$. The Gaussian filter $O(\mathbf{r})$ consisted of $21\times 21\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ and ${\sigma}_{O}=4$. The parameters of the Sigmoid function in the proposed filter $S[\xb7]$ were ${a}_{S}=300$ and ${t}_{S}=0.5$. The parameter ${\kappa}_{1}$ and ${\kappa}_{2}$ of the initial mask in Eq. (31) were 0.90 and 0.05, respectively; $H(\mathbf{r})$ consisted of $21\times 21\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ and ${\sigma}_{H}=2$. The window function $w(\mathbf{r})$ had the same size as the mask image, and all the values were set at 1. Instead of computing the step length ${\gamma}_{k}$ in Eq. (38) accurately, we set ${\gamma}_{k}$ at a constant 0.3 in each iteration. Since this paper focuses on developing a new regularization framework, process variations will not be taken into consideration in the proposed simulations. That means $\mathbf{v}$ is the nominal process condition and therefore $\psi (\mathbf{v})=1$. All the simulations were carried out with in-house MATLAB codes on a HPZ800 (3.47 GHz Xeon) workstation using a Windows 7 (64 bit) operating system.

## 3.1.

### Edge Distance Error

Figure 4 depicts an example of a desired pattern and its output pattern on the wafer. In this case, the true absolute area between the desired pattern contour and its output pattern contour is $1.853\times {10}^{4}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{nm}}^{2}$, the perimeter of the desired pattern is $1.70\times {10}^{3}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{nm}$, and therefore, the true EDE is 10.90 nm. Table 1 summarizes the relative error compared to the true EDE when using different pixel grid sizes. The EDE in Table 1 is calculated by Eq. (11) and the pattern error is calculated by Eq. (7). From Table 1, it is observed that the magnitude of pattern error varies with the pixel grid size, whereas EDE does not. When the pixel grid size is small enough (e.g., 0.5 nm), the EDE calculated by the proposed method is approximately equal to the true EDE. With the increase of pixel grid size, the accuracy of EDE remains acceptable. So, the EDE calculated by the proposed method can be used to guide mask synthesis.

## Table 1

Results of pattern error and edge distance error (EDE) when using different pixel grid sizes.

Pixel grid size (nm) | Pattern error | EDE (nm) | Relative EDE error (%) |
---|---|---|---|

0.5 | 7.410×105 | 10.897 | 0.28 |

1 | 1.831×105 | 10.768 | 1.2 |

1.5 | 8.017×104 | 10.610 | 2.7 |

2.5 | 2.816×104 | 10.352 | 5.0 |

3 | 1.912×104 | 10.124 | 7.1 |

## 3.2.

### Mask Filter

As shown in Eq. (25), the proposed mask filter consists of two portions: a Gaussian convolution operation and a Sigmoid (or thresholding) operation. Figure 5 demonstrates these filtering operations, where $O(\mathbf{r})$ is a defined Gaussian filter with a size of $21\times 21\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ and ${\sigma}_{O}=4$, $M$ is an intermediate mask pattern with a size of $321\times 321\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ and a grid resolution of 2.5 nm, which is commonly encountered during the optimization process of ILT, and $S[M]$ is the filtered pattern calculated by Eq. (25). As expected, the Gaussian convolution operation, $O\otimes M$, weakens the weight of the small details in $M$, and the Sigmoid operation leads to a sharper contour. As a result, the filtered pattern $S[M]$ has a lower complexity, and its mask quadratic error (denoted as QE in Fig. 5) reduces from $4.28\times {10}^{4}$ to 531.

Figure 6 presents another set of simulations for the proposed filter, where $M$ is an input mask pattern with a size of $321\times 321\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ and a grid resolution of 2.5 nm. This input mask pattern is artificially introduced with some objects that are difficult to manufacture in practice, including some small isolated holes, protrusions, hollows, and irregular features shown inside the red circles. From the perspective of signal processing, these details can be considered as high-frequency noise in the mask and can be evaluated with total variation.^{11}^{,}^{12} As revealed in Fig. 6, the total variation (denoted as TV in Fig. 6) of $M$ reduces from 3080 to 2416 via the Gaussian convolution operation, which removes these small details. Subsequently, by the Sigmoid operation, it leads to a close-to-binary mask with a total variation of 2677, which reduces total variation by 13.1% compared to the original mask $M$. On the other hand, it is interesting to find that the EDE of the output pattern of the mask $M$, the Gaussian filtered mask, and the filtered mask $S[M]$ are almost the same. That is because the optical lithography system with a low-pass nature does not deliver high-frequency details to the output pattern on the wafer. Similar to the optical lithography system, the mask filter acts as a low-pass filter to remove these details that are produced in ILT, whereas it does not cause distortions on the output pattern on the wafer. As demonstrated in Figs. 5 and 6, the proposed filter reduces the mask complexity and achieves a close-to-binary mask, so that the filtered mask $S[M]$ is reachable in real manufacture.

## 3.3.

### Results of Mask Filtering Technique

Figure 7 shows the simulated images by using the proposed method for a desired pattern with a CD of 45 nm. The optimization is terminated after 200 iterations. The desired pattern ${M}^{*}$, which is commonly encountered in the design of static random access memory circuits, consists of $321\times \phantom{\rule{0ex}{0ex}}321\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ with a grid resolution of 2.5 nm. As expected, the optimized mask patterns by the proposed method achieve much smaller EDE compared to that obtained by simply inputting the desire pattern ${M}^{*}$ as the mask pattern. It is also observed that the optimized gray-mask ${M}^{\mathrm{S}}$ is very close to the postprocessing mask ${M}^{\mathrm{P}}$ and reaches an almost identical output pattern and EDE. This demonstrates that the validity of the proposed method is to synthesize a regular mask pattern and to reach a considerably low EDE.

Figure 8 presents some intermediate results obtained in the iteration process by the proposed method. $M\#0$ denotes the initial mask and is calculated by using Eq. (31); $M\#n$ means the mask that is obtained after the $n$’th iteration and is calculated by using Eq. (41). The EDE means EDE between the output pattern of the $M\#n$ and the desired pattern. It is noted that each obtained intermediate mask by the proposed method is very close to binary and has a low mask complexity. This demonstrates that the proposed method can filter (regularize) the mask to eliminate the gray-level transitions and small, unwanted objects. In comparison, Fig. 9 also shows some intermediate results obtained in the iteration process by the conventional regularization method. The conventional regularization method takes different penalty terms and incorporates them into the cost function with the corresponding weight and then seeks the minimum of such a weighted cost function. In this case, we take the quadratic term, for example, and the corresponding weighted parameter $\lambda $ is set at 0.1. From Fig. 9, it is observed that the intermediate result with this method possesses gray-level transitions. The EDE may satisfy a 5% CD error after 50 iterations, while the mask quadratic error is pretty high at this moment; it still needs extra iterations to reduce the mask quadratic error although the EDE achieves the demanded result. This is one of the drawbacks of the conventional regularization method. Comparing Fig. 8 to Fig. 9, the intermediate mask by using the proposed method has a lower level in both mask quadratic error and mask complexity, which is quite an improvement over the conventional regularization method. Since the convergence of EDE and mask quadratic error by the conventional regularization method is out of synchronization, it, therefore, needs several iterations to achieve a low level on both EDE and mask quadratic error, although in the proposed method, the iteration (optimizing process) can be stopped whenever EDE reaches the demanded result without worrying about the manufacturability.

Figure 10 depicts the convergence properties with different methods. The results by the conventional regularization method with different weighted parameters $\lambda $ demonstrate that a small weight causes a fast convergence on EDE but results in a slow convergence on mask quadratic error; a large weight results in a fast convergence on mask quadratic error while finally causing a higher EDE. That means a smaller $\lambda $ will not achieve the regularization effects, whereas a larger $\lambda $ may result in a large EDE. For this reason, it is difficult to choose an appropriate value of weighted parameter $\lambda $ to get a win–win situation. This is the second drawback of the conventional regularization method. On the other hand, it is observed that the EDE by the proposed method converges rapidly while the mask quadratic error remains at a low level, which demonstrates that all the intermediate masks satisfy the mask quadratic error constrains.

Another two sets of simulations under different illumination conditions are shown in Fig. 11. Simulation of the designed mask pattern ${M}^{1}$ is performed on a partially coherent imaging system with an annular source illumination (${\sigma}_{\text{out}}/{\sigma}_{\text{in}}=0.7/0.4$) and the NA of 0.85. The mask ${M}^{1\mathrm{S}}$, i.e., obtained by the proposed method consists of a size $401\times 401\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ with a grid resolution of 2.5 nm. Simulation of the designed mask pattern ${M}^{2}$ is performed on a partially coherent imaging system with a quasar source illumination (${\sigma}_{\text{out}}/{\sigma}_{\text{in}}/\mathrm{deg}=0.9/0.6/45\xb0$) and the NA is of 1.25. The mask ${M}^{2\mathrm{S}}$, i.e., obtained by the proposed method consists of a size $361\times 361\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ with a grid resolution of 2.5 nm. From Fig. 11, it is demonstrated that the proposed method can synthesize a mask pattern under different imaging conditions and shows the possibility of reaching a considerably low EDE.

We also performed simulations for more complicated patterns by using the proposed method. Figure 12 depicts the results for one desired pattern ${M}^{*}$, which is a contact layer of the benchmark AND-OR-INVERT gate circuit layout,^{32} consisting of $601\times 1081\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$ with a grid resolution of 2.5 nm, i.e., the simulation area is $1500\times 2700\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{nm}}^{2}$. The proposed method results in a smooth mask pattern with an EDE of 2.11 nm compared to 8.37 nm by simply inputting the desired pattern as the mask pattern. These results further demonstrate that the proposed method has the capability of achieving a small EDE and of ensuring the regularity of the synthesized mask.

Table 2 summarizes the average runtime of each iteration using different methods with different mask patterns. As revealed in Table 2, the average runtime of each iteration by the proposed method is almost the same as that by the conventional method. That is because the proposed method just adds a computation of Eq. (28), whose runtime is far less than the total calculation time compared to the conventional regularization method. In other words, the proposed method enhances the mask manufacturability with an almost equal runtime. In this perspective, the proposed method is therefore more efficient than the conventional regularization method.

## Table 2

The average runtime of each iteration by different methods with different mask patterns.

Mask pattern | Run time (seconds) | |
---|---|---|

The conventional method | The proposed method | |

M* in Fig. 7 | 1.175 s | 1.183 s |

M1 in Fig. 11 | 1.944 s | 1.957 s |

M2 in Fig. 11 | 1.517 s | 1.528 s |

M* in Fig. 12 | 6.672 s | 6.717 s |

## 4.

## Conclusions

In this paper, we have demonstrated the application of a mask filtering technique and the metric EDE to solve the inverse lithography problem. The mask filtering technique interprets gray-level transitions and small, unwanted block objects as unwanted noise in the mask, and employs a filter to remove this noise to satisfy manufacturing constraints. The proposed filter consists of two portions: a Gaussian convolution operation to weaken the weight of the small details in mask and a thresholding operation to produce a sharper contour. The advantage of this approach lies in that it enhances the manufacturability of each intermediate mask without raising computational complexity and avoids choosing weighted parameters of various regularization terms.

In addition, we introduce a metric called EDE to guide mask synthesis and establish the correlation between pattern error and EPE. EDE is defined as the absolute area between the desired pattern contour and its output pattern contour divided by the perimeter of the desired pattern. It can be interpreted as the mean EPE and can be approximated by pattern error multiplied by a constant portion that only depends on the simulation resolution and desired pattern. Therefore, EDE has the same dimension as EPE and has a continuous expression as pattern error. The mask filtering technique and the metric EDE are expected to have direct applications in mask optimization and synthesis for optical lithography in semiconductor industry.

## Appendices

## Appendix:

### Derivation of Eq. (28)

To derive Eq. (28), we first give some useful intermediate results such as

## (44)

$$\frac{\partial \mathrm{sig}[x]}{\partial x}=\frac{\partial \frac{1}{1+{e}^{-a(x-t)}}}{\partial x}=a\xb7{\left[\frac{1}{1+{e}^{-a(x-t)}}\right]}^{2}\xb7[{e}^{-a(x-t)}]\phantom{\rule{0ex}{0ex}}=a\xb7\mathrm{sig}[x]\xb7[1-\mathrm{sig}(x)]$$## (45)

$$\frac{\partial [h(\mathbf{\rho})\otimes M(\mathbf{\rho})]}{\partial M(\mathbf{r})}=\frac{\partial [\sum _{{\mathbf{r}}_{1}}M({\mathbf{r}}_{1})h(\mathbf{\rho}-{\mathbf{r}}_{1})]}{\partial M(\mathbf{r})}=h(\mathbf{\rho}-\mathbf{r}),$$## (46)

$${\nabla}_{M}S[M]=\frac{\partial \sum _{\mathbf{\rho}\in {\mathrm{\Omega}}_{1}}\mathrm{sig}[O(\mathbf{\rho})\otimes M(\mathbf{\rho})]}{\partial M(\mathbf{r})}\phantom{\rule{0ex}{0ex}}=\{{a}_{S}\xb7\mathrm{sig}[O\otimes M]\xb7[1-\mathrm{sig}(O\otimes M)]\}\phantom{\rule{0ex}{0ex}}\xb7\frac{\partial \sum _{\mathbf{\rho}\in {\mathrm{\Omega}}_{1}}\mathrm{sig}[O(\mathbf{\rho})\otimes M(\mathbf{\rho})]}{\partial M(\mathbf{r})}\phantom{\rule{0ex}{0ex}}=\{{a}_{S}\xb7\mathrm{sig}[O\otimes M]\xb7[1-\mathrm{sig}[O\otimes M]]\}\phantom{\rule{0ex}{0ex}}\xb7[\sum _{\mathbf{\rho}\in {\mathrm{\Omega}}_{1}}O(\mathbf{\rho}-\mathbf{r})]=\sum _{\mathbf{\rho}\in {\mathrm{\Omega}}_{1}}[{O}^{\text{flip}}(\mathbf{r}-\mathbf{\rho})\phantom{\rule{0ex}{0ex}}\xb7\{{a}_{S}\xb7\mathrm{sig}[O\otimes M]\xb7[1-\mathrm{sig}[O\otimes M]]\}]\phantom{\rule{0ex}{0ex}}={a}_{S}\xb7{O}^{\text{flip}}\otimes \{\mathrm{sig}[O\otimes M]\xb7[1-\mathrm{sig}[O\otimes M]]\}.\phantom{\rule{0ex}{0ex}}$$## Acknowledgments

This work was funded by the National Natural Science Foundation of China (Grant No. 91023032, 51005091, 51121002), the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120142110019), the National Science and Technology Major Project of China (Grant No. 2012ZX02701001), and the National Instrument Development Specific Project of China (Grant No. 2011YQ160002).

## References

## Biography

**Wen Lv** is currently a PhD candidate at Huazhong University of Science and Technology under the guidance of Prof. Shiyuan Liu. He received his BS degree from the School of Mechanical Science and Engineering of the same university in 2011. His research involves various issues in optical lithography, including inverse lithography, fast optical image simulation, and mask writing technique. He is a student member of SPIE and IEEE.

**Qi Xia** is an associate professor of the School of Mechanical Science and Engineering at the Huazhong University of Science and Technology, China. He received his PhD degree in mechanical engineering from the Chinese University of Hong Kong (CUHK), China, in 2007. His current interests include structural and material design optimization for microelectromechanical sensors and actuators, mechatronics and automation. He is a member of IEEE.

**Shiyuan Liu** is a professor of mechanical engineering at Huazhong University of Science and Technology, leading his Nanoscale and Optical Metrology Group with research interest in metrology and instrumentation for nanomanufacturing. He also actively works in the area of optical lithography, including partially coherent imaging theory, wavefront aberration metrology, optical proximity correction, source mask optimization, and inverse lithography technology. He received his PhD in mechanical engineering from Huazhong University of Science and Technology in 1998. He is a member of SPIE, OSA, AVS, IEEE, and Chinese Society of Micro/Nano Technology (CSMNT). He holds 30 patents and has authored or coauthored more than 100 technical papers.