Future lithography systems must produce chips with smaller feature sizes, while maintaining throughput comparable to
today's optical lithography systems. This places stringent data handling requirements on the design of any direct-write
maskless system. To achieve the throughput of one wafer layer per minute with a direct-write maskless lithography
system, using 22 nm pixels for 45 nm technology, a data rate of 12 Tb/s is required. In our past research, we have
developed a datapath architecture for direct-write lithography systems, and have shown that lossless compression plays a
key role in reducing throughput requirements of such systems. Our approach integrates a low complexity hardware-based
decoder with the writers, in order to decode a compressed data layer in real time on the fly. In doing so, we have
developed a spectrum of lossless compression algorithms for integrated circuit rasterized layout data to provide a
tradeoff between compression efficiency and hardware complexity, the most promising of which is Block Golomb
Context Copy Coding (Block GC3). In this paper, we present the synthesis results of the Block GC3 decoder for both
FPGA and ASIC implementations. For one Block GC3 decoder, 3233 slice flip-flops and 3086 4-input LUTs are utilized in a Xilinx Virtex II Pro 70 FPGA, which corresponds to 4% of its resources, along with 1.7 KB of internal memory. The system runs at 100 MHz clock rate, with the overall output rate of 495 Mb/s for a single decoder. The corresponding ASIC implementation results in a 0.07 mm2 design with the maximum output rate of 2.47 Gb/s. In addition to the
decoder implementation results, we discuss other hardware implementation issues for the writer system data path,
including on-chip input/output buffering, error propagation control, and input data stream packaging. This hardware data
path implementation is independent of the writer systems or data link types, and can be integrated with arbitrary directwrite
Parameter-specific and simulation-calibrated ring oscillator (RO) inverter layouts are described for identifying and
quantitatively modeling sources of circuit performance variation from source/drain stress, shallow trench isolation (STI)
stress, lithography, etch, and misalignment. This paper extends the RO approach by adding physical
modeling/simulation of the sources of variability to tune the layouts of monitors for enhanced sensitivity and selectivity.
Poly and diffusion layout choices have been guided by fast-CAD pattern matching. The accuracy of the fast-CAD
estimate from the Pattern Matcher for these lithography issues is corroborated by simulations in Mentor Graphics
Calibre. Generic conceptual results are given based on the experience from preparing of proprietary layouts that pass
DRC check for a 45 nm test chip with ST Micro. Typical improvements in sensitivity of 2 fold are possible with layouts
for lithography focus. A layout monitor for poly to diffusion misalignment based on programmable off-sets shows a
0.8% change in RO frequency per 1nm poly to diffusion off-set. Layouts are also described for characterizing stress
effects associated with diffusion area size, asymmetry, vertical spacing, and multiple gate lengths.
In previous publications we have proposed a hierarchical variability model and verified it with 90nm test data. This
model is now validated with a new set of 45nm test chips. A mixed sampling scheme with both sparse and exhaustive
measurements is designed to capture both wafer level and chip level variations. Statistical analysis shows that the acrosswafer
systematic function can be sufficiently described as parabolic, while the within-die systematic variation is now
very small, with no discernible systematic component. Analysis of pattern dependent effects on leakage current shows
that systematic pattern-to-pattern LEFF variation is almost eliminated by optical proximity correction (OPC), but stressrelated
variation is not. Intentionally introduced gate length offset between two wafers in our dataset provides insight to
device parameter variability and sheds additional light on the underlying sources of process variation.
This paper applies process and circuit simulation to examine plausible explanations for measured differences in ring
oscillator frequencies and to develop layout and electronic circuit concepts that have increased sensitivity to
lithographic parameters. Existing 90nm ring oscillator test chip measurements are leveraged, and the performance
of ring oscillator circuit is simulated across the process parameter variation space using HSPICE and the Parametric
Yield Simulator in the Collaborative Platform for DfM. These simulation results are then correlated with measured
ring oscillator frequencies to directly extract the variation in the underlying parameter. Hypersensitive gate layouts
are created by combining the physical principles in which the effects of illumination, focus, and pattern geometry
interact. Using these principles and parametric yield simulations, structures that magnify the focus effects have been
found. For example, by using 90° phase shift probe, parameter-specific layout monitors are shown to be five times
more sensitive to focus than that of an isolated line. On the design side, NMOS or PMOS-specific electrical
circuits are designed, implemented, and simulated in HSPICE.
Achieving the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22-nm pixels for 45-nm feature sizes, requires data rates of about 12 Tb/s. In our previous work, we developed a novel lossless compression technique specifically tailored to flattened, rasterized, layout data called context copy combinatorial code (C4), which exceeds the compression efficiency of all other existing techniques including BZIP2, 2D-LZ, and LZ77, especially under a limited decoder buffer size, as required for hardware implementation. In this work, we present two variations of the C4 algorithm. The first variation, block C4, lowers the encoding time of C4 by several orders of magnitude, concurrently with lowering the decoder complexity. The second variation, which involves replacing the hierarchical combinatorial coding part of C4 with Golomb run-length coding, significantly reduces the decoder power and area as compared to block C4. We refer to this algorithm as block Golomb context copy code (block GC3). We present the detailed functional block diagrams of block C4 and block GC3 decoders, along with their hardware performance estimates as the first step of implementing the writer chip for maskless lithography.
Achieving the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22 nm pixels for 45 nm feature sizes, requires data rates of about 12 Tb/s. In our previous work, we developed a novel lossless compression technique specifically tailored to flattened, rasterized, layout data called Context-Copy-Combinatorial-Code (C4) which exceeds the compression efficiency of all other existing techniques including BZIP2, 2D-LZ, and LZ77, especially under limited decoder buffer size, as required for hardware implementation. In this paper, we present two variations of the C4 algorithm. The first variation, Block C4, lowers the encoding time of C4 by several orders of magnitude, concurrently with lowering the decoder complexity. The second variation which involves replacing hierarchical combinatorial coding part of C4 with Golomb run-length coding, significantly reduces the decoder power and area as compared to Block C4. We refer to this algorithm as Block Golomb Context Copy Code (Block GC3). We present the detailed functional block diagrams of Block C4 and Block GC3 decoders along with their hardware performance estimates as the first step of implementing the writer chip for maskless lithography.
A future maskless lithography system that replaces traditional masks with an array of electro-mechanical mirrors relies on a very high rate data interface to achieve the wafer throughputs comparable to today's optical lithography systems. In order to write one layer per minute in 45nm technology node, a throughput of 12Tb/s using 5-bit grayscale data is needed. With EUV light source flash rates limite to below 10kHz, 240 million 1μm x 1μm micromirrors have to be integrated on the writer chip, each driven with 32 possible voltage levels.
This paper explores the system design for various wafer throughputs, with or without data compression. In particular, the design tradeoffs for the mirror interface datapath, implemented on the same silicon die with the writers are discussed. The design of the digita-to-analog converters (DACs) that compensate for the nonlinearity of the mirror transfer function and fit into the required datapath pitch is presented. Extrapolated data from the designs in 0.13μm CMOS technology indicate that DACs will likely limit the throughput to about 30 wafers per hour in 45nm node.
Future maskless lithography systems require data throughputs of the order of tens of terabits per second in order to have comparable performance to today’s mask-based lithography systems. This work presents an approach to overcome the throughput problem by compressing the layout data and decompressing it on the chip that interfaces to the writers. To achieve the required throughput, many decompression paths have to operate in parallel. The concept is demonstrated by designing an interface chip for layout decompression, consisting of a Huffman decoder and a Lempel-Ziv systolic decompressor. The 5.5mm x 2.5mm prototype chip, implemented in a 0.18μm, 1.8V CMOS process is fully functional at 100MHz dissipating 30mW per decompression row. By scaling the chip size up and implementing it in a 65nm technology, the decompressed data throughput required for writing 60 wafers per hour in 45nm technology is feasible.