For the past four decades, cost and features have driven complementary metal-oxide semiconductor (CMOS) scaling. Severe lithography and material limitations seen below the 20-nm node, however, are challenging the fundamental premise of affordable CMOS scaling. Just continuing to co-optimize leaf cell circuit and layout designs with process technology does not enable us to exploit the challenges of sub-20-nm CMOS. For affordable scaling, it is imperative to work past sub-20-nm technology impediments while exploiting its features. To this end, we propose to broaden the scope of design technology co-optimization (DTCO) to be more holistic by including microarchitecture design and computer-aided design, along with circuits, layout, and process technology. Furthermore, we undertook such a holistic DTCO for all critical design elements such as embedded memory, standard cell logic, analog components, and physical synthesis in a 14-nm process. Measurements results from experimental designs in a representative 14-nm process from IBM demonstrate the efficacy of the proposed approach.
Escalating manufacturing cost and complexity is challenging the premise of affordable scaling. With lithography accounting for a large fraction of wafer costs, researchers are actively exploring several cost-effective alternative lithographic techniques, such as directed self-assembly, self-aligned multiple patterning, etc. However, most of the alternative lithographic techniques are restrictive, and it is important to understand the impact of such pattering restrictions on system-on-chip (SoC) design. To this end, we artificially restricted all layers in a 14 nm process to be pure gratings and observed that the pure gratings approach results in an inefficient SoC design with several process integration concerns. To come up with a technology definition that is mindful of designer requirements, it is essential to undertake a holistic design technology co-optimization (DTCO) considering several SoC design elements, such as standard cell logic, static random access memory bitcells, analog blocks, and physical synthesis. Our DTCO on the IBM 14 nm process with additional 10- and 7-nm node-like pattern restrictions leads us to converge on a set of critical pattern constructs that are required for an efficient and affordable SoC design.
Given the deployment delays for EUV, several next generation lithography (NGL) options are being actively researched. Several cost-effective NGL solutions, such as self-aligned double patterning through sidewall image transfer (SIT) and directed self-assembly (DSA), in conjunction with process integration challenges, mandate grating-like pattern design. As part of the GRATEdd project, we have evaluated the design cost of grating-based design for ASICs (application specific ICs). Based on our observations we have engineered fundamental changes to the primary ASIC design components to make scaling affordable and useful in deeply scaled sub-20 nm technologies: unidirectional-M1 based standard cells, application-specific smart SRAM synthesis, and statistical and self-healing analog design.
The 14 nm node is seeing the dominant use of three-dimensional FinFET architectures, local interconnects, multiple
patterning processes and restricted design rules. With the adoption of these new process technologies and design styles,
it becomes necessary to rethink the standard cell library design methodologies that proved successful in the past. In this
paper, we compare the design efficiency and manufacturability of standard cell libraries that use either unidirectional or
bidirectional Metal 1. In contrast to previous nodes, a 14 nm 9-track unidirectional standard cell layout results in up to
20% lower energy-delay-area product as compared to the 9-track bidirectional standard cell layout. Manufacturability
assessment shows that the unidirectional standard cell layouts save one exposure on Metal 1, reduces process variability by 10% and layout construct count by 2-3X. As a result, the unidirectional standard cell layout could serve as a key
enabler for affordable scaling.
As the metal pitch continues to shrink, it becomes inefficient, if not impossible, to use traditional via redundancy
schemes at and below the 14 nm node. Double-cut vias and via bar connections will either block many adjacent routing
resources or make it impossible to pattern at these advanced technologies nodes. In this paper we examine a scalable via
redundancy strategy based on local loops. We evaluate the yield and timing impact of local loops and use a 14 nm
standard cell library and functional block designs to assess the design cost of local loops. Furthermore, lithography
contours and process window simulations are used to demonstrate the manufacturability of this structure. With
supporting EDA tools and design-technology co-optimization (DTCO), local loops will become an important via
redundancy topology at sub-20nm nodes.
The traditional design rule paradigm of defining the illegal areas of the design space has been deteriorating at the
advanced technology nodes. Radical design space restrictions, advocated by the regular design fabrics methodology,
provide an opportunity to reshape the design/manufacturing interface by constraining the layout to a set of allowable
patterns. As such, this would allow for guaranteed convergence of the source mask optimization techniques (SMO) and
complete validation of the legal design space during technology development and ramp. However, the number of the
unique patterns generated by the layout adhering to even the simplistic gridded design rules prohibits this approach.
Nevertheless, we have found that just 10% of the unique geometric patterns are sufficient to represent 90% of all layout
pattern instances. Furthermore, the overall number of layout patterns on Active, Contact, and Metal-1 design layers can
be reduced through modification of existing layout shapes in the final layout database and insertion of non-essential
layout features. Unlike the 'dummy fill' used for chemical mechanical polishing (CMP), the newly added shapes must
resemble the patterning of the functional design features and be inserted in close proximity to them. In this paper, we
evaluate the digital circuit performance impact of the additional layout parasitics introduced by these 'dummy' features.
In particular, we have found that a significant pattern count reduction can be achieved with minimal performance
penalty. These results have been used at PDF Solutions to enable a correct by construction layout style, such as the
templates and connectors-based layout methodology presented in the companion paper.
The semiconductor industry has pursued a rapid pace of technology scaling to achieve an exponential component cost reduction. Over the years the goal of technology scaling has been distilled down to two discrete targets. Process engineers focus on sustaining wafer costs, while manufacturing smaller dimensions whereas design engineers work towards creating newer IC designs that can feed the next generation of electronic products. In doing so, the impact of process choices made by manufacturing community on the design of ICs and vice-versa were conveniently ignored. Hoever, with the lack of cost effective lithography solutions at the forefront, the process and design communities are struggling to minimize IC die costs by following the described traditional scaling practices. In this paper we discuss a framework for quantifying the economic impact of design and process decisions on the overall product by comparing the cost-per-good-die. We discuss the intricacies involved in computing the cost-per-good-die as we make design and technology choices. We also discuss the impact of design and lithography choices for the 32nm and 22nm technology node. The results demonstrate a strong volume dependence on the optimum design style and corresponding lithography and strategy. Most importantly, using this framework process and design engineers can collaborate to define design style and lithography solutions that will lead to continued IC cost scaling.
The concept of template-based design-technology co-optimization as a means of curbing escalating design complexity
and increasing technology qualification risk is described. Data is presented highlighting the design efficacy of this
proposal in terms of power, performance, and area benefits, quantifying the specific contributions of complex logic gates
in this design optimization. Experimental results from 32nm technology node bulk CMOS wafers are presented to
quantify the variability and design-margin reductions as well as yield and manufacturability improvements achievable
with the proposed template-based design-technology co-optimization technique. The paper closes with data showing the predictable composability of individual templates, demonstrating a fundamental requirement of this proposal.
The unavailability of extreme ultra violet lithography (EUVL) for mass production of the 22nm technology
node has created a significant void for mainstream lithography solutions. To fill this void, alternate
lithography solutions that were earlier deemed to be technically and economically infeasible, such as
double patterning technologies (DPT), source mask optimization (SMO), massively parallel direct write ebeam
(MEBM) and Interference assisted lithography (Intf), are being proposed, developed and adopted to
ensure the timely deployment of the 22nm technology node. While several studies have been undertaken to
estimate the lithography process costs for volume production with the aforementioned technologies, these
studies have provided only a partial analysis since they have not taken into account the impact on design
density and product yield.
In this paper we use the cost-per-good-die metric in order to capture process costs as well as yield and
design density. We have developed a framework that estimates the lithography cost-per-good-die for
SRAM arrays and have applied it to evaluate the economical feasibility of the various lithography strategies
under consideration for the 22nm technology node. Specifically, we compare the cost-per-good-die for
different 32MB SRAM arrays, each optimized for a different lithography solution. Our analysis shows that
the selection of the best lithography strategy is both layer and volume specific. The use of DPT solutions is
recommended for Active and Contact layers. The use of Intf is recommended for layers such as Poly,
Metals and Vias in the case of low volume products. For medium to high volume products the use of SMO
is recommended for Poly, Metals and Vias. This paper provides quantifies of economic benefit of the
proposed lithography strategy.
Cost and complexity associated with OPC and masks are rapidly increasing to the point that they could limit technology
scaling in the future. This paper focuses on demonstrating the advantages of regular design fabrics for OPC
simplification to enable scaling and minimize costs for technologies currently in volume production. The application of
such a simplified OPC flow results in much smaller mask data volumes due to significantly fewer edges compared to the
conventional designs and OPC flows. Moreover, the proposed approach enables reduced mask write times, hence lower
We compare OPC performance and complexity on standard cell designs to that of layouts on a regular design fabric. We
first demonstrate advantages and limitations within an industrial model-based OPC solution. Then, a simplified rulebased
OPC solution is discussed for the Metal 1 layer. This simplified OPC solution demonstrates a 70X run time
improvement and an order of magnitude reduction in both the output edge count per unit shape and shot count per unit
shape while maintaining the printabalility advantages of regular design fabrics. The simplified OPC also demonstrates a
50% reduction in mask-write time. Finally, the benefit of regular design fabrics for OPC simplification and mask cost
reduction at a 32nm node is discussed.
The time-to-market driven need to maintain concurrent process-design co-development, even in spite of discontinuous
patterning, process, and device innovation is reiterated. The escalating design rule complexity resulting from increasing
layout sensitivities in physical and electrical yield and the resulting risk to profitable technology scaling is reviewed.
Shortcomings in traditional Design for Manufacturability (DfM) solutions are identified and contrasted to the highly
successful integrated design-technology co-optimization used for SRAM and other memory arrays. The feasibility of
extending memory-style design-technology co-optimization, based on a highly simplified layout environment, to logic
chips is demonstrated. Layout density benefits, modeled patterning and electrical yield improvements, as well as
substantially improved layout simplicity are quantified in a conventional versus template-based design comparison on a
65nm IBM PowerPC 405 microprocessor core. The adaptability of this highly regularized template-based design
solution to different yield concerns and design styles is shown in the extension of this work to 32nm with an increased
focus on interconnect redundancy. In closing, the work not covered in this paper, focused on the process side of the
integrated process-design co-optimization, is introduced.
As the industry hits a road block with RETs that attempt to aggressively scale k1, we propose to extend the life of optical
lithography by a complete co-optimization between circuit choices, layout patterns and lithography. We demonstrate that
the judicious selection of a small number of layout patterns along with the appropriate circuit topologies would not only
enable k1 relaxation but also efficient implementation of circuits. Additionally, in this paper, we discuss the use of
regular design fabrics to extend the life of current generation lithography equipment. We will introduce the Front End of
Line (FEOL) limited regular design fabric. The metal 1 patterns for this fabric are selected such that we can utilize a 1.2
NA 32nm metal 1 lithography process without any area penalty with respect to standard cells with conventional design
rules, which require a 32nm metal 1 process with a rather unrealistic k1 of 0.35 while using a more advanced 1.35 NA
tool. We also demonstrate simulation results on 2-dimensional layout patterns. The results suggest that smart selection of
layout patterns can enable the extension of single exposure lithography to a 32nm production lithography process.
In the past, complying with design rules was sufficient to ensure acceptable yields for a design. However, for sub-100-nm designs, this approach tends to create patterns that cannot be reliably printed for a given optical setup, thus leading to hot spots and systematic yield failures. Recent challenges faced by both the design and process communities call for a paradigm shift whereby circuits are constructed from a small set of lithography-friendly patterns that have previously been extensively characterized and ensured to print reliably. We describe the use of a regular design fabric for defining the underlying layout geometries of the circuit. While the direct application of this methodology to the current application-specific integrated circuit (ASIC) design flow would result in unnecessary area and performance penalties, we overcome these penalties via a unique design flow that ensures shape-level regularity by reducing the number of required logic functions as much as possible as part of the top-down design flow. We show that with a small set of Boolean functions and careful selection of lithography-friendly patterns, we not only mitigate but essentially eliminate such penalties. Additionally, we discuss the benefits of using extremely regular designs constructed from a limited set of lithography-friendly patterns not only to improve manufacturability but also to relax the pessimistic constraints defined by design rules. Specifically, we introduce the basis to exploit the regularity in the layout patterns by using "pushed-rules" for logic design, as is commonly done for static random access memory (SRAM). This in turn facilitates a common optical proximity correction (OPC) methodology for logic and SRAM. Moreover, by taking advantage of this newfound manufacturability and predictability of regular circuits, we show that the performance of logic built on regular fabrics can surpass that of seemingly more arbitrarily constructed logic.
In the past, complying with design rules was sufficient to ensure acceptable yields for a design. However, for sub 100nm designs, this approach tends to create patterns which cannot be reliably printed for a given optical setup, thus leading to hot-spots and systematic yield failures. The recent challenges faced by both the design and process communities call for a paradigm shift whereby circuits are constructed from a small set of lithography friendly patterns which have previously been extensively characterized and ensured to print reliably. In this paper, we describe the use of a regular design fabric for defining the underlying silicon geometries of the circuit. While the direct application of this methodology to the current ASIC design flow would result in unnecessary area and performance overhead, we overcome these penalties via a unique design flow that ensures shape-level regularity by reducing the number of required logic functions as much as possible as part of the top-down design flow. It will be shown that with a small set of Boolean functions and careful selection of lithography friendly patterns, we not only mitigate but essentially eliminate such penalties. Additionally, we discuss the benefits of using extremely regular designs constructed from a limited set of lithography friendly patterns not only to improve manufacturability but also to relax the pessimistic constraints defined by design rules. Specifically we introduce the basis for the use of "pushed-rules" for logic design as is commonly done for SRAM designs. This in turn facilitates a common OPC methodology for logic and SRAM. Moreover, by taking advantage of this newfound manufacturability and predictability of regular circuits, we will show that the performance of logic built upon regular fabrics can surpass that of seemingly more arbitrarily constructed logic.