## 1.

## Introduction

Automatic defect classification (ADC) is a wafer fabrication process that classifies defects into predefined types, e.g., particle, scratch, etc. Figures 1 and 1
show examples of a particle and scratches, respectively. By correctly classifying the defects, the cause of the defects can be analyzed, and this information is used for improving the process and consequently the yield. Hence, there have been many studies on ADC. For example, Kameyama and Kosugi:^{1} proposed a method that exploits a hyperellipsoid clustering network (HCN) with radial basis function (RBF) and model switching. Also, smart beam search (SBS) using a support vector machine (SVM) was proposed for feature selection.^{2} However, there are some difficulties in applying conventional appearance-based pattern classification methods (e.g., techniques used in face detection^{3}) to ADC, because defects in the same class have too many variants in their shapes. Also, since the defect regions occupy very small portions of the image, the global feature statistics (e.g., frequency or filter bank responses) are useless. To resolve these difficulties, we propose a new method based on classification-after-segmentation.

The most essential part of the process may be the correct segmentation of defects. Due to the shrinking bias problem,^{7} however, the conventional state of the art segmentation methods based on maximum a posterior Markov random fields (MAP-MRF) with a Potts model data term^{4, 5, 6} often oversegment a scratch (thin object) into several objects, and thus a scratch can appear as particles. For dealing with this problem, we propose a new MAP-MRF based segmentation method. To be specific, we develop a new energy function for the MAP-MRF scheme based on the Retinex theory^{8} for handling the blurry scratches and the color inconsistency. After segmentation, we define several features (shape, intensity, and so on) for each defect, and develop an AdaBoost classifier to tell whether the patch corresponds to a particle or other defects (including scratch).

In the experiments, we show that our segmentation method reduces the oversegmentation of scratches and thus keeps the false alarm rate low, even for high particle detection rates.

## 2.

## Segmentation

Among the many segmentation methods, the state of the art MAP-MRF approach is adopted here.^{4, 5, 6} That is, segmentation is achieved by minimizing an energy function:

## Eq. 1

$$E\left(f\right)=\sum _{p\u220aP}{V}_{p}\left({f}_{p}\right)+\sum _{p\u220aP}\sum _{q\u220aN\left(p\right)}{V}_{p,q}({f}_{p},{f}_{q}),$$## Eq. 3

$${V}_{p,q}^{C}({f}_{p},{f}_{q})=\mathrm{exp}[-\frac{{|I\left(p\right)-I\left(q\right)|}^{2}}{{\sigma}^{2}}]\delta ({f}_{p},{f}_{q}),$$## Eq. 4

$$\delta ({f}_{p},{f}_{q})=\{\begin{array}{ll}1& \text{if}\phantom{\rule{0.3em}{0ex}}{f}_{p}\ne {f}_{q}\\ 0& \text{otherwise}\end{array}\phantom{\}},$$^{5, 6}${V}_{p,q}^{C}({f}_{p},{f}_{q})$ enforces the continuity of labels by penalizing label discontinuities. However, it is well known that since the ${V}_{p,q}^{C}({f}_{p},{f}_{q})$ tries to minimize the length of boundary

^{9}(shrinking bias), this method often segments a long and thin object into several parts. Also, blurred boundaries deteriorate the performance of ${V}_{p}(\cdot )$ (especially when capturing thin objects). Hence, many long and thin objects in wafer images are frequently segmented into several parts, and they are often confused with particles.

To prevent oversegmentation, we develop a new function to be included into the data term:

## Eq. 5

$${V}_{p,q}^{R}({f}_{p},{f}_{q})=\left\{(1-\delta [L\left(p\right),L\left(q\right)])\right\}\times \delta ({f}_{p},{f}_{q}),$$^{8}To be specific, the $L\left(p\right)$ for our purpose is defined as

## Eq. 6

$$L\left(p\right)=\{\begin{array}{ll}1,& \text{if}\phantom{\rule{0.3em}{0ex}}\frac{I\left(p\right)}{G\left(p\right)}<T\\ 0,& \text{otherwise}\end{array}\phantom{\}},$$^{8}Therefore, $L\left(p\right)=L\left(q\right)$ and $p\u220aN\left(q\right)$ means that $p$ and $q$ have the same intrinsic colors [even if $|I\left(p\right)-I\left(q\right)|\u2aa20$ ], and the labels satisfying ${f}_{p}\ne {f}_{q}$ should be penalized. In summary, ${V}_{p,q}^{R}({f}_{p},{f}_{q})$ alleviates the shrinking bias by penalizing the discontinuities occurring on the pixels having the same intrinsic colors. A similar idea can also be found in the binarization of document images.

^{10}Finally, the smoothness term in Eq. 1 is given by

## Eq. 7

$${V}_{p,q}({f}_{p},{f}_{q})={\lambda}_{1}{V}_{p,q}^{C}({f}_{p},{f}_{q})+{\lambda}_{2}{V}_{p,q}^{R}({f}_{p},{f}_{q}),$$## 3.

## Classification

Since each particle and scratch is segmented into a single region by the proposed segmentation algorithm, we can use several features extracted from the segments to determine whether it is a particle defect or not.

## 3.1.

### Features

Let $S$ be a set of positional vectors in a given segment, $\left|S\right|$ be the size of $S$ , and $I(x,y)$ be the pixel intensity at $(x,y)\u220aS$ . Then features can be summarized as follows.

Mean intensity:

Shape descriptor:

where ${\lambda}_{\mathrm{max}}$ and ${\lambda}_{\mathrm{min}}$ are two eigenvalues of the covariance matrix of $S$ (when the value is close to unity, the shape has no directional preference, and ${\lambda}_{\mathrm{max}}\u2215{\lambda}_{\mathrm{min}}\u2aa21$ means that the shape is a thin and elongated one).Texture measure:

where ${\lambda}_{i}$ ’s are eigenvalues of the covariance matrix of the vectors at points $(x,y)\u220aS$ ,## Eq. 11

$$(\left|\frac{\partial I}{\partial x}\right|,\left|\frac{\partial I}{\partial y}\right|,\left|\frac{{\partial}^{2}I}{\partial {x}^{2}}\right|,\left|\frac{{\partial}^{2}I}{\partial {y}^{2}}\right|).$$Measure of orientation bias (of edges): obtained from the histogram of

By computing the orientation histogram and summing the sizes of four dominant bins, we can measure the bias of orientation distribution.## 3.2.

### AdaBoost

For machine learning using the extracted features, we use the AdaBoost algorithm, where each weak classifier is based on the log-likelihood ratio test

## Eq. 13

$${h}_{t}\left(x\right)=\mathrm{ln}\phantom{\rule{0.2em}{0ex}}\frac{{p}_{t}^{+}\left(x\right)}{{p}_{t}^{-}\left(x\right)},$$^{11}${\alpha}_{t}$ is determined by the function of current error rate $e$ :

## 4.

## Experimental Results

The dataset for the experiment consists of defected images acquired by a $266\text{-}\mathrm{nm}$ bright field inspection instrument ( $12\text{-}\mathrm{in.}$ ⟨100⟩ oriented silicon wafer, magnification of more than ${10}^{4}$ times). Among the dataset, we use 380 images including particle defects and 150 images including scratch defects as training samples. Then we test 200 images containing particle defects and 200 images having no particle defects (of course they contain other defects such as scratch defects). As can be seen in Fig. 1, 1, 1, 1, the proposed term improves segmentation performance. The main purpose of ADC is to automatically classify the particle from the other kinds of defects (mostly scratches), hence we evaluate the performance of ADC using the detection ratio (DR) of particle defects and the false alarm (FA) of other defects considered as particle, which are defined as

## Eq. 16

$$\mathrm{DR}=\frac{\text{the}\phantom{\rule{0.3em}{0ex}}\text{number}\phantom{\rule{0.3em}{0ex}}\text{of}\phantom{\rule{0.3em}{0ex}}\text{correctly}\phantom{\rule{0.3em}{0ex}}\text{detected}\phantom{\rule{0.3em}{0ex}}\text{particle}\phantom{\rule{0.3em}{0ex}}\text{defects}}{\text{the}\phantom{\rule{0.3em}{0ex}}\text{number}\phantom{\rule{0.3em}{0ex}}\text{of}\phantom{\rule{0.3em}{0ex}}\text{all}\phantom{\rule{0.3em}{0ex}}\text{particle}\phantom{\rule{0.3em}{0ex}}\text{defects}},$$## Eq. 17

$$\mathrm{FA}=\frac{\text{the}\phantom{\rule{0.3em}{0ex}}\text{number}\phantom{\rule{0.3em}{0ex}}\text{of}\phantom{\rule{0.3em}{0ex}}\text{defects}\phantom{\rule{0.3em}{0ex}}\text{misclassified}\phantom{\rule{0.3em}{0ex}}\text{as}\phantom{\rule{0.3em}{0ex}}\text{particle}}{\text{the}\phantom{\rule{0.3em}{0ex}}\text{number}\phantom{\rule{0.3em}{0ex}}\text{of}\phantom{\rule{0.3em}{0ex}}\text{all}\phantom{\rule{0.3em}{0ex}}\text{scratch}\phantom{\rule{0.3em}{0ex}}\text{defects}}.$$## 5.

## Conclusion

In this work, we propose a new approach to ADC based on the classification-after-segmentation framework. The wafer image is first segmented based on the MAP-MRF approach, where a new energy function is designed to prevent the degeneration of scratch into several regions. Then, an AdaBoost classifier is trained using the features extracted from the segments. According to the experimental results on wide variants of particles, the proposed approach shows good classification performance.

## Acknowledgments

This research was supported by the Ministry of Culture, Sports and Tourism (MCST), and the Korea Culture Content Agency (KOCCA) in the Culture Technology (CT) Research and Development Program 2009.

## references

**,” 3505 –3510 (1999). Google Scholar**

*Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching***,” 212 –215 (2002). Google Scholar**

*Beam search for feature selection in automatic svm defect classification***,” 511 –518 (2001). Google Scholar**

*Rapid object detection using a boosted cascade of simple features***,” IEEE Trans. Pattern Anal. Mach. Intell., 23 (11), 1222 –1239 (2001). https://doi.org/10.1109/34.969114 Google Scholar**

*Fast approximate energy minimization via graph cuts***,” 105 –112 (2001). Google Scholar**

*Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images***,” 309 –314 (2004). Google Scholar**

*grabcut: interactive foreground extraction using iterated graph cuts***,” 1 –8 (2008). Google Scholar**

*Graph cut based image segmentation with connectivity priors***,” Scientific Am., 237 (6), 108 –128 (1977). https://doi.org/10.1038/scientificamerican1277-108 Google Scholar**

*The retinex theory of colour vision***,” 564 –571 (2005). Google Scholar**

*What metrics can be approximated by geo-cuts, or global optimization of length/area and flux***,” 547 –556 (2002). Google Scholar**

*A light-weight text image processing method for handheld embedded camera***,” 149 –172 (2002). Google Scholar**

*The boosting approach to machine learning: an overview*