Training procedure for scanning electron microscope 3D surface reconstruction using unsupervised domain adaptation with simulated data

Abstract. Accurate metrology techniques for semiconductor devices are indispensable for controlling the manufacturing process. For instance, the dimensions of a transistor’s current channel (fin) are an important indicator of the device’s performance regarding switching voltages and parasitic capacities. We expand upon traditional 2D analysis by utilizing computer vision techniques for full-surface reconstruction. We propose a data-driven approach that predicts the dimensions, height and width (CD) values, of fin-like structures. During operation, the method solely requires experimental images from a scanning electron microscope of the patterns concerned. We introduce an unsupervised domain adaptation step to overcome the domain gap between experimental and simulated data. Our model is further fine-tuned with a height measurement from a second scatterometry sensor and optimized through a tailored training scheme for optimal performance. The proposed method results in accurate depth predictions, namely 100% accurate interwafer classification with an root-mean-squared error of 0.67 nm. The R2 of the intrawafer performance on height is between 0.59 and 0.70. Qualitative results also indicate that detailed surface features, such as corners, are accurately predicted. Our study shows that accurate z-metrology techniques can be viable for high-volume manufacturing.


Introduction
Over the years, there is a continuous shrink of semiconductor devices, which increases the demand for more sophisticated metrology techniques.The measurements are used to control the production process in order to optimize production yield.As the shrinking of dimensions continues, local measurements become necessary, since stochastic behaviors of physical processes become more dominant in describing the shape of structures. 1 Moreover, the third dimension becomes increasingly important, as vertically dominated structures (e.g., FinFETs, gate-all-around, and 3D NAND) have grown into popular building blocks for semiconductor devices.In this work, we zoom in on the dimension of a fin, which is the transistor channel of a fin field-effect-transistor (FinFET), as a key description parameter for the height of the physical structures (see Fig. 1).Currently, metrology devices that are suitable for high-volume manufacturing, are optimized for 2D measurements.This work investigates if accurate reconstruction of a 3D surface of the chip is possible, by applying computer vision techniques.
Scanning electron microscopes (SEMs) are widely used in the production process to inspect defects and measure the dimensions of structures due to their ability to provide high-resolution, localized measurements at a relatively fast rate.Other tools offer more local and precise measurements (e.g., transmission electron microscopy 2 and focused ion beam SEM 3 ), but compromise in practicality, since the structures require first to be cut into slices, which is a destructive and cumbersome operation.Alternatively, scatterometry tools 4 are nondestructive and faster than SEMs, but they are only suitable for global measurements on repetitive structures.On top of that, scatterometry suffers from long time-to-recipe and significant model complexity in thick stacks.In summary, various measurement tools are available, each with its own trade-offs in terms of accuracy and speed.In this work, we utilize a combination of these tools.
This research is focused on applying data-driven computer vision techniques for semiconductor systems.Most of the methods need a ground truth during training, which can be difficult to obtain for nanometer-sized structures due to the high cost and complexity of performing precise device measurements.This makes ground-truth information scarce, from a machine learning perspective.Instead, we need to explore alternative solutions.
One strategy to address the absence of topological ground truth for SEM images is to generate synthetic data by modeling an SEM as accurately as possible.There are various electron particle simulators 5 available, which simulate the scattering of particles in semiconductor devices.The input required for these simulators is a 3D surface model of the structure, which can also serve as a reliable reference for training a neural network.Although these simulators have their advantages, they are not without limitations.Important components of the SEM tool, such as the electron column, are not fully modeled, leading to certain physical phenomena not being represented in the simulated SEM images, such as spherical and chromatic aberration effects, which can affect resolution.Additionally, effects such as charging, 6 contamination, and material shrinkage, which impact the accuracy of the tool, are not always incorporated as well.Another point of consideration is that creating 3D meshes of on-chip structures requires significant effort.
Our goal in this research is to use a combination of different measurement tools to achieve optimal algorithm performance.We combine experimental and synthetic SEM images by utilizing an unsupervised domain adaptation method to bridge the gap between the two.Additionally, measurements from a scatterometry sensor are used to calibrate the height dimension.To achieve optimal results, we propose a novel, optimized training strategy, and workflow.The proposed method is applied to line-space gratings, which serve as a proxy structure for FinFET transistor gates.The combination of the line-space gratings and FinFET structures indicates the relevance of this use-case, i.e., the results of this experiment can be directly applied to other relevant transistor structures and construct.
The contributions of this paper are as follows: • A reliable algorithm for accurately predicting the surface of the wafer, utilizing a combination of data sources, such as experimental and synthetic SEM images as well as scatterometry data.
• The use of a tailored unsupervised domain adaptation method to bridge synthetic and experimental images, while preserving geometrical information, in the field of SEM nanometer-sized structures. 2 Related Work

Depth Estimation from SEM Images
In the past, various methods have been developed for estimating depth in SEM images.Most techniques are based on homography, [7][8][9][10] which require two images taken from a different angle.This can be impractical in the semiconductor industry, where speed is a critical factor and using multiple images for measurements significantly lowers throughput.Also physical constraints in the SEM column restrict the angle and resolution of the electron beam.Other methods use multiple detectors, 11,12 where depth is obtained from differences in signal strengths between detectors.However, these methods are constrained to using smooth surfaces only.Another approach is to exploit the focus change of the electron beam, 13,14 similar to the depth of field effect in optical systems.Unfortunately, this technique is only feasible for the micrometer range and insufficient for most current applications in the semiconductor industry.Instead, in this work, we investigate methods that only require one SEM measurement during inference.These methods are mostly based on synthetic modeling. 15-17

Synthetic Data Generation
Synthetic databases are commonly used in machine learning, as label generation incurs little cost.Two examples of popular datasets in the field of computer vision are GTA5 18 and FlyingThings3D. 19Methods that rely on synthetic data have been applied to various computer vision tasks, such as semantic segmentation, and depth estimation in various application fields. 20,213][24] However, these datasets often lack relevant ground-truth data for depth estimation, or have limited images available, making them not suitable for deep learning techniques.

Synthetic-to-Real Transfer
Domain adaptation is a widely used technique for closing the gap between real and synthetic data for which many methods have been proposed.These methods include reconstruction-based techniques [25][26][27] that use generative models to enhance synthetic data, adversarial-based methods that generate synthetic target data which is related to the source domain, [28][29][30] and discrepancybased methods that promote a domain-invariant feature space. 31,32Also non-learnable techniques 33 are capable of bridging the gap, which can be useful when limited data are available.In this work, we use a reconstruction-based method, which is derived from a method proposed by Atapour-Abarghouei and Breckon. 34

Deep Learning for SEM Data
Deep learning has already been applied to various tasks relevant to SEM images, such as line edge roughness (LER) estimation, 35 denoising, 36,37 and defect inspection. 38More specifically, similar to the method presented in this work, networks with cyclic losses have also been applied to SEM data.Examples of the latter approach aim at enhancing the quality of the SEM images 39 or mapping the chip layout designs to SEM images. 40These works demonstrate that the use of deep learning can be effective in improving prediction results on SEM data.

Methodology
The proposed method builds upon previous work on depth estimation for SEM images, 41 where it was demonstrated that the height of on-chip structures can be predicted using neural networks trained with simulated SEM images.Several improvements to this method are made with this work.First, the synthetic geometry generation is enhanced to generate more realistic structures.Second, an unsupervised domain adaptation method is added to accommodate for the domain gap between synthetic and experimental images, which requires a training procedure to preserve the important information present in the images.Third, a more realistic use case is employed, where the design of experiments is significantly upgraded.Specifically, multiple wafers with different pattern-height setpoints are developed and measured, as described in Sec. 4.

Domain Adaptation
The goal of domain adaptation is to maximize the performance of a specific task (e.g., depth estimation) on a target domain (e.g., experimental SEM images), even when ground-truth data are not available, by leveraging well-labeled data from a source domain (e.g., synthetic SEM images).The domains are typically related but have dissimilarities, which is referred to as a domain gap in machine learning.Empirically, we have discovered that inference performance on experimental images from the network is unsatisfactory (i.e., no realistic structures as output) when trained with synthetic images.This effect can be accounted for by a large domain gap, as previous research 41 has shown that inference performance can be satisfactory when the domain gap is less pronounced.
As can be seen in Fig. 2, structural dissimilarities are present between the synthetic and experimental images.These differences arise from the fact that not all physical effects resulting from using SEMs are incorporated into the simulator.For example, there is some asymmetric edge blooming present in the images because the charge accumulates differently at downward and upward flanks relative to the scan direction.Also, a clear intensity drop in the middle of the line structure occurs in the experimental signal, which is caused by a mismatch between contrast and/or edge blooming with the simulator.The simulator is verified with bulk yield measurements, but both effects are dependent on the geometry.Another discrepancy is a parabolic curvature effect appearing in the experimental data, suspected to be the result of how the electron beam propagates through the electron-optical column.
Since there are no dense height labels available in the experimental domain, an unsupervised domain adaptation technique is used to bridge the domain gap.In this work, we employ CycleGAN, an architecture introduced by Zhu et al. 42 The cyclic loss in this architecture bypasses the need for defining exactly matched training pairs in the dataset.CycleGAN consists of two generative adversarial networks (GANs), one responsible for translating images from domain A to B, and one for the inverse translation.Domain A is defined as the experimental domain and B as the synthetic domain.During the final training procedure, only one of the trained generators is used for inference.

Steps of the Method
An overview of all important steps is shown in Fig. 3. First, a synthetic dataset is generated that matches the experimental data as closely as possible.A small portion of the experimental data (in this case a separate test wafer) together with the synthetic data is used to train the domain adaptation network.The synthetic data are employed to train a surface-prediction network, which is a modified version of the generator network of the work of Wang et al. 43 Then a pipeline is constructed of both networks, where an experimental image is first converted to a synthetic image with the domain adaptation network and then into a height map with the surfaceprediction network.The resulting height map is then calibrated with a corresponding scatterometry measurement H gt .After this, the final prediction network is trained, using the weights of the previously trained surface-prediction network as a starting point.We train with the original experimental data, paired with the output from the described pipeline, i.e., the calibrated height maps.This approach combines the experimental input with the final output into a single network, which is able to establish a correct information mapping and account for the possible loss of information in the domain adaptation step.The next section elaborates further on this.Another benefit of this approach, compared with the method presented by Atapour-Abarghouei and Breckon, 34 is that the final inference path consists of a single network, which is beneficial for speed optimization during high-volume manufacturing.

Geometrical Information Preservation
The pipelined steps that generate the height maps used for training the final prediction network are designed to preserve known geometrical properties.However, the domain adaptation network does not have complete control over the image translation.Therefore, we focus on the feature preservation in both the 2D and height information, in the xyand z-directions, respectively.

2D information
We have observed that edge information when applying CycleGAN roughly coincides during domain transfer, when learning from similar data in both domains.Therefore, we closely match Fig. 3 Overview of the training method: First, the domain adaptation model ( 1) and surface-prediction model (2) are trained.Then experimental data are supplied into a pipeline of these models (3).The resulting depth maps are calibrated (4) with height H gt from a scatterometry tool.Then a final prediction network is trained (6) with experimental data (3) in combination with the calibrated height maps (5).This network is used for the final inference of the 3D surface reconstruction of the chip (8).
the geometry edge-profile distribution of the synthetic images to the experimental images.This is achieved by generating the line-space roughness patterns using a statistical method, which is further explained in Sec. 4.However, in this case, we want to perfectly match the line-width (CD) information in the SEM image before and after domain adaptation.This is achieved by adding an extra edge-preserving loss to the network.The loss is placed over the generators (G) of the GAN because it uses the input image (I) and output image GðIÞ.This cross-domain loss is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 5 1 where Furthermore, Ĩ is the input image (I) convolved with a Gaussian kernel (σ ¼ 3), and Ĩþ , which is used in the denominator of the exponent fraction, is the maximum value of the gradient matrix that normalizes the exponent.This loss can be interpreted as a weighted L1-loss, where similarity is enforced at edges present in the input image, which results in strict 2D information preservation.This loss L edge is added as a weighted sum to the other losses present in the network.

Height information
The height information is captured within the contrast of the image, which is texture information and is likely to be lost during the domain adaptation step.To correct this, we scale the resulting height maps in the z-direction with a value from a corresponding scatterometry measurement H gt .Despite being an average value, this is valid to exploit, since the properties of a lithographic multilayer etch process implies that the structure height within the field of view of a scatterometry measurement is quite constant.The method used for this is part of pixel-wise fine-tuning, 41 where the histogram of the height map is calculated.This results in two peaks at the bottom/top levels of the structure, which are scaled with H gt distance in between.

Experiments
In the subsequent sections, we will outline the specific application, describe the methods used for data collection and processing, how the experiment was conducted, as well as some implementation details of the algorithm.

Application
The following use case is considered.During the front-end-of-line, individual transistors are patterned onto the chip.Modern transistors (FinFETs) consist of fins, which are rectangular structures mapped onto the substrate.The height of these structures is an important parameter, as it partly defines the switching voltage, stray fields, and parasitic capacities between transistors.We examine after-etch line space dielectric gratings, which serve as a proxy geometry for this use case.
The first step is to construct geometries on wafers, as depicted in Fig. 4. Initially, two layers of material are deposited onto the wafer.Then a lithography step is carried out, followed by an etching step, which yields the final structure.Several wafers are produced at around three different height levels with varying CD values.

Experimental Data
Initially, all wafers are measured at 60 or 240 different fields using a SEM tool.The measurement times of the sparsely and densely sampled wafers were around 40 min and 2 h, respectively.Furthermore, there is a gap of 12 days between the measurements of the sparsely and densely sampled wafers.Important SEM parameters are a landing energy of 800 eV, 3.3 nm full-width at half-maximum (FWHM) spotsize and a resolution of 0.8 nm per pixel.We have also validated one structure with a cross-sectional SEM (XSEM) measurement, useful for tuning the parameters of the created geometry, which is discussed later.Subsequently, scatterometry measurements are captured at the same locations as the SEM measurements.Unfortunately, we could not realize to converge to a parametric grating model on the raw pupil data coming from the machine.Instead, surrogate labels have been used, where the heights are measured of an open pad (part on the measured chip where only the SiCN material is present) and a closed pad (part on the measured chip where the full-oxide layer is present).The total height label is then calculated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 3 7 By comparing this combination of measurements with the real grating height from a few TEM measurements, we have observed a small global offset of <1 nm, which can be compensated.We have also found that this offset is independent of the CD of the structure, which means that the measurements are correct relative to each other.

Synthetic SEM Data
After acquiring the experimental measurements, closely matching synthetic geometries are created.Upon examining the XSEM image, we noticed that the substrate layer (SiCN) is also partly etched.As a result, the parametric model displayed in Fig. 5 is used, where the etch height is divided into two parts, h 1 and h 2 .The observed dimensions from the experimental data, along with the corresponding generation parameters derived from these, are listed in Table 1.We allow for a small sidewall angle (sloped edge profile) and top and bottom corner roundings.We introduce LER to the geometries along the lines by adopting the Thorsos method from Mack. 44his method is based on the power spectral density (PSD), where the autocorrelation is approximated by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 1 1 This approach is used to make the geometries as realistic as possible.Parameters l c , α, and σ represent the correlation length, roughness factor, and standard deviation, respectively.All parameter values are sampled uniformly from the defined ranges in Table 1.Even though a Gaussian distribution would be a more realistic model for some parameter distributions, a uniform distribution yields a more diverse dataset, with more edge cases as input to the network.We have created two geometries for every height in the specified range, with steps of 0.2 nm.The total amount of created geometries is 452, as depicted in Fig. 5.
For the simulation part, Nebula 5 is utilized.The simulation parameters used include a landing energy of 800 eV, a pixel size of 0.8 nm, and a spot size of 3.3 nm FWHM, matched with the experimental SEM data.The applied materials are indicated in Fig. 5. Additionally, some preprocessing is applied on the SEM images prior to starting the training process.First, the image values are scaled between 0 and 255 to fully exploit the 8-bit dynamic range.Second, detector noise is added, which has a significant contribution to the image SNR in the present operating range.This noise is modeled as an additive Gaussian distribution and tuned to the experimental data.

Dataset Construction
This experiment utilizes six wafers, each containing three reference heights.For each height, a densely (210 fields) and sparsely (60 fields) measured wafer is available.The densely measured wafers are selected for training, whereas the sparsely measured wafers are reserved for testing.The training and test sets are separated on a per-wafer basis to avoid data leakage in the model.Figure 6 represents an overview of the height and CD plot distributions in the dataset.It is important to note that there is no obvious correlation between height and CD in this dataset.This lack Table 1 Measured and simulated ranges of the parametric model.The simulated range is a superset of the actual range.Rounding is defined as a ratio of the CD: 2 CR CD × 100%.In simulations, the thickness of the SiCN layer was set to infinity, as the maximum penetration depth of the electrons is less than the minimum thickness of the layer.Roughness factor (α) 0.5 0.5 of correlation is crucial, as it prevents the network from using CD as a basis for height prediction, which is undesirable.

Implementation Details
The domain adaptation network is implemented using the default settings of CycleGAN, with 64 generator filters.The identity loss is enabled with a weight of λ ¼ 0.5.The weight of the reconstruction is set to λ ¼ 10 and the weight of the edge-preserving loss is set to λ ¼ 0.1.
The network is trained for 400 epochs using 50 images from both domains.The learning rate is set to 0.0002 with a linear decay after 200 epochs.The surface-prediction network is a modified version of the Pix2PixHD algorithm 43 that uses six Resblocks and a one-layer encoder and decoder, which downsamples and upsamples the 265 × 265-pixel input to 128 × 128 pixels.The L 1 loss is used only as the optimization objective, while the discriminator is dropped.During pretraining, the network is trained for 150 epochs, with an initial learning rate of 0.0002 and a linear decay after 100 epochs.In the final fine-tuning step, the network is trained for 200 epochs with a learning rate of 0.0001, and a linear decay after 100 epochs.Random cropping is always used as a data augmentation technique.When solely training with synthetic data, also random rotation and flipping are enabled.The reason for this is that experimental data does contain artifacts that are dependent on the scan direction.All networks are trained on a single Tesla V100 GPU.

Results
This section first provides the results of the domain adaptation method.These results compare the CycleGAN method with and without the edge-preserving loss, to demonstrate the significance of the adaptation to the training procedure.Then the qualitative and quantitative results of the overall method are presented.Afterward, a comparison of comparable training procedures from the literature is given.

Unsupervised Domain Adaptation
Results of the domain adaptation network after training with a separate wafer, are shown in Fig. 7.As can be observed, the domain gap is significantly reduced.The two main contributors, asymmetric edge blooming and the line-intensity drop, are no longer present.Since we use only the central 1024 × 1024 pixels of the image for further processing, the parabolic curvature effect is negligible.We have evaluated the preservation of 2D features during inference by applying a contouring algorithm designed for SEM applications on both the experimental and the corresponding translated images from the synthetic domain.We have compared two implementations of CycleGAN, one with the original loss functions and one with the edge-preserving loss included.Then we have measured the average CD values of the validation set (∼20 images) before and after conversion with and without the edge-preserving loss.The measured average CD difference is 0.59 and 0.09 nm, respectively, indicating that the 2D preservation performance is improved over 6 times.The results of the best-performing method are presented in Fig. 8.It can be observed that the borders before and after domain adaptation coincide with sub-pixel accuracy.Additionally, by analyzing the PSD signal generated with the method described by Pu et al., 45 it is found that the noise floor only differs slightly.

Qualitative Results
Figure 9 shows an SEM image together with the height map from each wafer.Realistic linespace patterns are obtained.The height differences between the wafers (about 10 nm) are hardly noticeable in the SEM image by visual inspection.However, with the depth prediction, this can be clearly observed.
Figure 10 shows a comparison of a predicted geometry pattern, obtained by averaging in one direction, and a TEM measurement from a wafer cross section.The profile of the predicted pattern closely matches the reliable part of the image.Notably, the predicted profile is only adjusted through horizontal and vertical translation, without changes in size or shape.This indicates that detailed features, such as sidewall angle and corner rounding, will also be predicted realistically.This outcome highlights the value of using simulated data for pretraining and offers a promising perspective for detailed geometry reconstruction.1), (c) zoomed version of (d), and (e) PSD analysis of the edges.Blue is before and orange is after domain adaptation.

Quantitative Results
Figure 11 presents the final prediction results of the average height of every SEM image measured on all three test wafers.The average height metric is obtained by calculating the distance between the peaks of the histogram of the resulting height map.The imposed height stems from the scatterometry measurement.Every SEM image is classified in the cloud of points of their wafer, which makes the interwafer classification score 100%.It is evident that correlation is present within a wafer, as the R 2 scores are between 0.76 and 0.81.Also the average error is about 0.5 nm.
Figure 12 shows the final prediction result of the average CD of every SEM image in the dataset for the final method.The CD metric is obtained by imposing a threshold on the depth map at 60% and then dividing the amount of pixels above the threshold by the total amount of pixels.The 60% is related to the threshold of common CD algorithms from the industry.The imposed CD is coming from the CD label, which is derived from a CD analysis on an SEM image.As can be noticed, a clear correlation between imposed and obtained CD is visible, indicating that the CD information is preserved during the training process.However, there is a small offset present between wafers, which remains unresolved at this point.

Performance of Other Methods
We develop a comparison between five different methods.(1) Pix2pixHD 43 trained on synthetic data, without any fine-tuning.(2) Pixel-wise fine-tuning, 41 as presented in the previous work.
(3) Pixel-wise fine-tuning with an additional domain adaptation step, as discussed by Atapour-Abarghouei and Breckon. 34Here the method is supplemented with a default CycleGAN  network, which is used during training and inference.(4) The previous method (3) but extended with customized training procedure described.The emphasis of this procedure is to preserve the height information.(5) The proposed novel method.This procedure also adopts the edge-preserving loss in the domain adaptation network, which improves geometric information preservation in all directions.The results of this method have already been discussed in Figs.9-12.
Figure 13 and Tables 2 and 3 present a comparison between the different methods, including the proposed technique.As can be observed, the first method, which does not employ finetuning, has very inaccurate height predictions.Additionally, since no domain adaptation method is used in the first two methods, realistic surfaces are only obtained in methods 3 and 4. For the other methods, it is visible that a correlation is present.However, because no domain adaptation is applied and shapes are unrealistic (method 2) and height information is lost in the domain adaptation step (method 3), results are not optimal.Method 3 seems to do a classification per wafer, where the majority of the points are predicted correctly (lowest MAE), but this method lacks height sensitivity, as R 2 statistics are poor.The final methods 4 and 5 show the best performance on height because the sensitivity (R 2 ) is closest to unity, which is the most important metric for process control.
The CD results are interesting since we observe a per-wafer bias when the final training step is directly performed on experimental data.This effect has been mitigated by incorporating an      edge-preserving loss in the proposed method.However, we consider the results from method 3 remarkably well, although this method shows clearly higher error metrics in the vertical direction (covered by R 2 1 to R 2 3 ).

Conclusion
This paper extends 2D to 3D using computer vision for full-surface reconstruction.We have presented a data-driven method that predicts the dimensions, height and width (CD) values, of fin-like structures.The results show that we can handle situations where a significant domain gap occurs between data domains, which allows extracting heights from SEM images on more diverse experimentally obtained data.The obtained results are closing the gap between the synthetic and experimental SEM images.The intra-wafer height prediction results indicate that it is possible to estimate height values with a mean absolute error (MAE) of 0.5 nm, which is within specification for various modern applications.Also the height sensitivity is presented with the method proposed, as the R 2 values are between 0.55 and 0.70.The SNR values of SEM images and the calibration sensor may play a role in further improving the results.The resulting algorithm for interwafer height prediction, utilizing a combination of multimodal data sources, can produce realistic height values with a maximum absolute error of 2.06 nm and a MAE of 0.5 nm, which makes it a suitable contribution to the metrology application field.However, the available data are crucial as the prediction can become biased when the data samples are not homogeneously distributed, which is also partially presented in our case.We anticipate that when a more balanced dataset is available, the average error metrics could be further improved, or more general methods will be sufficient for solving the problem.
One important question that needs to be addressed is the generalizability of the proposed method across different scenarios and its impact on performance, as this algorithm is to generate accurate results under varying conditions.First, it is expected that this algorithm can handle morphological changes, especially when the simulated data are diverse enough.In fact, previous work 41 demonstrated that the network can predict structures that were not included in the training set, such as defects.It is also known that the presented method is able to generalize across different wafers measured with multiple days in between.Nevertheless, a proper study of robustness of this method on machine drift is still an open point of research.
As previously discussed, the reliance on calibration data is still presented, mainly due to the domain shift between synthetic and experimental data.In this particular case, we use scatterometry, which limits the applicability of this method on repetitive structures.However, other local tools, such as TEM and AFM, could be employed, which would broaden the scope of the method considerably, but hands in on speed and practicability.Another possible option is to examine the extent to which grating measurements close to nonrepetitive parts (e.g., logic) are representative, which would potentially reduce the need for local calibration.Additionally, it is postulated that by utilizing better forward models of the SEM process, the dependence on an external calibration source could be decreased and eventually eliminated.
Compared to a basic average height regression algorithm, the proposed method has the significant advantage that local information and feature details can be derived from a full-depth map.This implies that not only CD and height but also LER, SWA, and corner-rounding values as well as the dimensions of defects can be measured.This is illustrated with a qualitative comparison of a TEM image with a prediction result.Further experiments for quantitative comparisons would be useful for future work.Tim Houben received his master's degree in electrical engineering from Eindhoven University of Technology.Currently, he is a PhD student at Eindhoven University of Technology and a visiting researcher in the Research Department of ASML.He works on 3D reconstruction for scanning electron microscopy.
Thomas Huisman received his master's degree in 2012 from the University of Twente and his PhD in 2016 from Radboud University, The Netherlands.In 2017, he joined ASML as a researcher.
Maxim Pisarenco earned his PhD in applied mathematics from Eindhoven University of Technology.Since 2011, he has been in the Research Department at ASML, where he is currently working as a senior scientist.His research interests include inverse problems, machine learning, model discovery, and holistic data-and physics-driven modeling.
Fons van der Sommen an assistant professor at Eindhoven University of Technology and heads the Healthcare and High-Tech Cluster of the VCA Research Group.He has worked on a variety of image processing and computer vision applications, mainly in the medical domain, and strives to exploit signal processing and information theory methods to improve the robustness, efficiency, and interpretability of modern-day AI architectures.
Peter de With is a full professor and leads the VCA group at Eindhoven University of Technology.He has co-authored more than 70 refereed international book chapters and journal articles, more than 500 conference publications, and 40 international patents.He served as a technical committee member of the IEEE CES, ICIP, and SPIE and is a member of the Royal Holland Society of Academic Sciences and Humanities.

Fig. 1
Fig. 1 Example drawing of fin structures on a wafer, indicating the main dimension of interest in this work.

Fig. 2
Fig.2Qualitative visualization of the domain gap present between the images.An example of a synthetic and experimental SEM image is depicted.The noiseless SEM profiles, averaged in the direction parallel to the lines, are displayed below.The main contributing effects of the domain gap are specified and highlighted.

Fig. 4
Fig.4Overview of the steps involved in placing a grating structure on the wafer.

Fig. 5
Fig. 5 Geometry creation process.(a) The parametric model used to generate the synthetic data.(b) 3D crop of the line pattern constructed with the materials used.(c) 3D rendering of the geometry created with visible roughness.(d) Top view of the full geometry.

Fig. 6
Fig. 6 Distributions of labels in the average CD and height variation in the dataset.Every dot represents a field.

Fig. 7
Fig. 7 Example result of the domain adaptation step: (a) arbitrary synthetic image for comparison; (b) experimental image; (c) generated synthetic image conditioned on the experimental image; and (d) average SEM signal before and after domain adaptation.

Fig. 8
Fig. 8 Experimental results on 2D information preservation during the domain adaptation step.(a)-(d) The SEM edge detection algorithm results projected over the artificial synthetic image.A red line (before domain adaptation) is plotted over the green line (after domain adaptation).The more the green line is overwritten, the better.(a) CycleGAN results, 42 (b) zoomed version of (a), (d) CycleGAN with edge-preserving loss Eq.(1), (c) zoomed version of (d), and (e) PSD analysis of the edges.Blue is before and orange is after domain adaptation.

Fig. 10
Fig.10Comparison of a predicted profile (blue line) projected on a TEM measurement of a sliced wafer.The comparison is performed on the same section of the wafer, not strictly at the same place.The top (red) part is not reliable to compare because of further processing steps, such as polishing, which are not considered in this work, have altered the top of the geometry after the SEM measurement.

Fig. 9
Fig. 9 SEM image from each test wafer with the corresponding reconstructed surface.

Fig. 12
Fig. 12 CD performance of the entire dataset.Every dot represents a measurement.The red line is a linear fit over all measurements.The blue line is exactly y ¼ x , indicating perfect correspondence.The color bar indicates the height.

Fig. 11 Fig. 13
Fig. 11 Results of the three test wafers on the height metric, showing inter-and intrawafer performance.The red line is a linear fit per wafer.The blue dotted line is exactly y ¼ x , indicating perfect correspondence.The color bar indicates the CD.

Table 2
Numeric comparison of all five methods on height metrics.MAE, mean error (ME), root-mean-squared error (RMSE), and maximum error (MAXE), all in nanometers.R 2 expresses the coefficient of determination of the linear fit of all measurements in the test set.

Table 3
Numeric comparison of all methods on all CD metrics.MAE, ME, RMSE, and MAXE, all in nanometers.R 2 expresses the coefficient of determination of the linear fit of all measurements in the test set.Metrics R 2 1 , R 2 2 , and R 2 3 express the fit per wafer.