:: LogicVision's Complete RTL-to-GDS2 Flow
By: Jean-François Côté, Fadi Maamari, Benoit Nadeau-Dostie
Rising gate counts, signal integrity issues, and the migration to RTL-to-GDS2 flows and hierarchical design flows has created many challenges for Design for Test (DFT) methodology.
A major impediment is that testing must be implemented for logic, memory, IOs and other circuitry. But today's approach of relying on point tools for each of these circuit types imposes constraints on the design flow.
A unified flow such as LogicVision's Embedded Test (ET) can handle all DFT implementation regardless of circuit type will reduce complexity and speed product cycles. The DFT flow must, of course, be compatible with the three major physical design flows from Cadence, Magma and Synopsys. LogicVision's DFT flow meets this important criterion.
In this article, we will describe LogicVision's Embedded Test for DFT methodology. Using it maximizes self-testability of circuits. It has additional benefits, including reducing test costs by using Minimum Pin Count (MPC) technology.
The physical design flows of Magma, Synopsys and Cadence are pretty much equivalent as far as LogicVision's Embedded Test additions are concerned. The main three steps are illustrated in Figure 1 and identified as Phys_Step1 to 3.
The first step consists in coding the RT level description with its corresponding functional SDC constraints file used to drive the physical synthesis and optimization tools downstream.
A physical prototyping/floor planning tool is used to help valid functional logic timing. This allows quick iterations on the functional RTL description, making timing closure easier in subsequent steps.

Figure 1. Physical design flow
Physical synthesis tools can generate a functional netlist based on placed gates. Using this, specialized third-party tools can analyze and modify the netlist.
It is advantageous to reconcile this modified netlist with the data model of the physical synthesis tool because it preserves the ability to modify the circuit architecture if simple restructuring is not sufficient during the physical optimization step.
We will explain how this is done for Magma's flow later but equivalent capabilities are present for the Synopsys and Cadence flows.
During the physical optimization step, clock tree synthesis is performed as well as high fanout net buffering and detailed routing. The output is the final routed netlist. The ultimate representation of the circuit in GDS2 format is derived from this netlist.
In general, it is beneficial to insert test logic at the RT level for optimum area and timing results. Scan chain connections are an important exception because doing so would impose constraints on gate placement during physical synthesis. All physical optimization tools accept SCANDEF -- the industry standard for chain reordering -- so scan connections can be finalized after gate placement.
Test insertion should not modify the design hierarchy. For example, the addition of a test wrapper around a block should not require creating an extra level of hierarchy. Otherwise, all functional design constraints need to be adjusted to take the hierarchy change into account.
In hierarchical flows, it is important to preserve the footprint of all the blocks separately laid out. That is, both the RT and gate level view of a block should have the same input and output port list. This is applicable to all test features, including scan.
Several flows incorporating DFT have been documented [1-5] but none of them can implement all test features described in this article.
The remainder of this article is organized as follows.
- Description of main test insertion flow.
- Variations are described.
- Explanation of additional considerations for hierarchical flows
- Implementation example
Section II: Test insertion flow description
The modified RTL to GDS2 flow including the main steps required to insert Embedded Test is shown in Figure 2. The test-dedicated steps are identified as ET_Step1 to 5.
ET_Step1 (Checker) : This tool checks if a design meets ET requirements. It consists of a fast synthesis engine that generates an unmapped gate-level representation of the circuit using complex primitives that can check all testability rules. All rules are verified in parallel before any test insertion. The tool analyzes and proposes auto-correction of violations related to improper clocking or reset configurations, for example. It also proposes testability improvements such as inserting control/ observation cells at the output/input of an untestable portion of the circuit.
All changes are automatically implemented in ET_Step3 once the designer has approved them. The designer can also decide to modify the RTL description. Iterations are quick because analysis is performed at the RT level and checks can be made for individual blocks.
The other crucial role of Checker is the extraction of all pertinent design information from the RTL description that will be used in the generation of all embedded test features. A complete map of the clock domains, for example, is generated for registers and memories. This information is used later on by all ET components (logic, memory, IOs).
ET_Step2 (Planner): This tool helps makes ET trade-offs based on user-specified criteria and the information in the CheckerInfo file extracted from the circuit during the previous step. The user can specify a maximum for test time, power, area as well as other parameters such as routing and timing. This is done using a configuration file automatically populated with defaults based on the type of ET components used in the circuit. A few examples:
- A user-specified maximum test time in the equation helps estimate the number and length of future scan chains per clock domain.
- A maximum power determines how many memories can be tested in parallel.
- A maximum area specification limits the number of test points inserted in the circuit.
Although not explicitly shown in Figure 2, the Planner can use information about the circuit floor plan to determine how to share memory test resources. It accepts standard P/DEF files and a user-specified parameter called "clusterSizeRatio".
This parameter is a percentage of the diagonal length of a circuit (or block). It defaults to 20%. It indicates the maximum distance between a test controller and any one of the memories in a cluster as shown in Figure 3. Clusters can also be explicitly defined to the Checker tool.

Flow with insertion at RT level

Figure 3. Memory BIST clusters
The Planner generates an environment that consists of a set of directories, each corresponding to a specific task (e.g. generation of memory test controller, logic test controller). It contains a configuration file and a README file. The configuration file contains the best defaults that could be determined from the circuit data extracted in the previous step.
ET_Step3 (Assemble): This tool inserts all ET controllers (logic, memory, PLL), including the Test Access Port (TAP) and boundary scan circuitry required for IOs, in one step. It also inserts testpoints that are determined by the Checker tool or the user.
All ET logic can be simulated to verify correct operation in the context of the entire circuit at this point. This verification is fairly fast because it is targeted at proving that all connections have been made and is done at the RT level. There are connections between the controllers and the circuit pins as well as connections that intercept functional signals. All testbenches that were generated in the previous step to verify the controllers in isolation are automatically retargeted to circuit pins. The functional operating mode can be simulated, or, formal verification can prove that the circuit is functionally equivalent after test insertion. Automatically generated script asserts the reset input of the TAP which in turns resets all test controllers such that the circuit is configured in functional mode.
In addition to the RTL description of the ET logic, all SDC timing constraints specific to the ET logic are put in a file that can be merged with the functional SDC file as shown in Figure 2.
Corresponding Static Timing Analysis (STA) scripts are also automatically generated. Most of the constraints in these scripts are false paths declaration related to the ET logic which introduces virtually no critical paths.
The primary exception is related to the distribution of the scan enable signal which is pipelined by design. Pipeline stages are automatically duplicated by the physical optimization tool later on.
Note that both the SDC and STA scripts refer only to names guaranteed to be in both the RT and gate level representations of the circuit -- the same scripts can be used at both levels. This essentially means that constraints are only put on registers and clock inputs.
Note also that the SDC file generated by the Assemble tool is exploiting the SDC 1.4 format which allows propagating multiple clocks in parallel on common nets to precisely constrain both the functional and test modes.
Additional files (not shown) are generated to help automate the subsequent physical design steps:
- A very detailed README file of instructions and portions of scripts to be added to the user scripts and sourced in the physical synthesis and optimization tools.
- Other files containing variables and functions specific to the physical design tools environment are also generated.
The interface to all major physical design tools are based on tcl. So, only the names of variables and procedures vary slightly for each vendor.
ET_Step4 (Scan): The function of this step is to substitute non-scannable flip-flops with scannable ones, if not already done, and connect the scan chains to the logic test controller, which already has the correct number of scan ports. The Planner pre-calculated that number based on the number of functional registers that the Checker found in the circuit and the user specifications (e.g. maximum test time, power, etc...).
The synthesized netlist is also analyzed to find random pattern resistant portions of the circuit. The location of both control and observation testpoints are calculated as described in [10]. However, the implementation of control test points has been improved so that all timing paths from testpoint registers to any other register are false paths to facilitate timing closure.
After scan chains and testpoints have been inserted, static timing analysis, and simulations can be performed based on the preliminary timing information available in the circuit SDF file. Since the circuit is not completely optimized yet, the verification effort by simulation is again kept to a minimum. Formal verification can also be performed as explained before.

Figure 4:Test point optimization
A SCANDEF and a post-scan SDC file are generated and provided as input to the optimization step (Phys_Step3). The SCANDEF file describes the scan chains as initially inserted in the netlist but also contains information that allows the reordering of chains based on the final placement. The post-scan SDC file is concatenated to the pre-scan and functional ones to drive incremental optimization needed due to the insertion of testpoints.
The objective of the physical synthesis step was to minimize gate area (and power) while making sure that no path has negative slack. This results in circuits with a large number of paths with near-zero positive slack. These paths tend to include complex logic which is likely to be targeted during testpoint selection and cause some paths to have negative slack. Performing a simple incremental physical optimization restructures the testpoint logic with the rest of the functional logic to restore the positive slack as shown in Figure 4.
As indicated earlier, Magma's flow allows the reloading of a database (called Volcano) that was generated during the synthesis step along with the post Scan netlist so that even architectural changes are possible in addition to incremental optimization.
ET_Step5 (Final Signoff): Final verification of the at-speed embedded test implementation is performed during this step. At this stage, the post-layout SDF is used to perform static timing analysis and simulations. Again here, formal verification can be run. A database containing test and diagnostic information is generated. The information is used to generate test programs on a tester or to embed the block in a larger circuit.
Flow variations
Variations are available. In some situations, test insertion cannot be performed at the RT level. In some ASICs, for example, the vendor only has access to the gate level description from designers. In LogicVision's flow, this means the Planner and Assemble steps are deferred until a gate level netlist is available after the synthesis step.
However, the Checker step should still be performed on the RT level description for faster turn-around time. The Checker tool does not require any DFT knowledge and can be easily run by designers. In fact, the tool is based on Spyglass software from Atrenta that performs other types of RTL analysis for designers.
For situations when logic BIST is not used and the physical synthesis tool can handle scan insertion for ATPG, the Scan step can be eliminated and there is no need to generate an intermediate netlist.
Hierarchical flow
There are several reasons to justify a hierarchical flow.
- Several instances of the same block can be reused on the same circuit. It makes sense to design a single instance down to the physical level so all have identical performance. It also makes sense to include the embedded test features so that the test can be reused for all instances. This is a form of test compression when compared to approaches that don't take the hierarchical implementation into account.
- A block could be reused in other designs.
- Possibly most important, the design might exceed or approach the maximum capacity of the design tools. A divide-and-conquer approach is necessary in this case.
The flow described here can be applied on hierarchical designs in a bottom-up or top-down manner. In the top-down flow, the entire chip RTL can be loaded in the Checker tool and the designer specifies which blocks will be laid out separately. The rules can be checked for each block and the chip top level. An advantage of the top-down flow is that all boundary conditions for the lower-level blocks are known, which allows optimization of external test mode of operation.
The use of a Wrapper TAP (WTAP), as defined in the IEEE 1500 standard [11], in each block that greatly facilitates the hierarchical flow. The WTAP controls all test controllers internal to that block as shown in Figure 5 and is added at the RT level. The footprint of the block remains constant even if the number of test controllers is changed due to different trade-offs or design modifications.
Figure 5. Chip with Embedded Test
Ports for "peripheral" scan chains used during the logic test of the top level block are also added to the block boundaries again to preserve the footprint even after the actual scan chain insertion during the Scan step. Peripheral scan chains are composed of functional flip- flops which are used to provide a block wrapper. This wrapper is used for the internal test of the block and the test of the logic at the next level up the hierarchy.
This arrangement enables at-speed tests to be performed on most logic between major blocks because the flip- flops of the wrappers are clocked the same way as in the functional mode. See [9] for more detail on this type of block isolation called "shared isolation".
Usually, there is only a limited amount of logic between the flip-flops of the peripheral scan chains and the actual block boundaries. However, when the amount of logic is too large or if the number of peripheral flip-flops becomes too large compared to the number of block pins, the Checker proposes the placement of "dedicated isolation" cells at certain input block pins with large fanout or output block pins with large fanin to reduce the amount of peripheral logic and flip-flops. These dedicated isolation cells are automatically inserted by the Assemble tool.
Implementation example
We demonstrate the flow on a small communication circuit having the following characteristics:
13 clock domains
4 frequencies: 250MHz, 200MHz, 133MHz, 48MHz
16 memory instances of various sizes
26105 flip-flops
318613 logic gates (nand2 equivalent)
132 IOs (signal pins only)
2.6mm x 2.6mm

Figure 6. Example circuit
The RTL portion of the design flow (ET_Step1 to 3), takes a few minutes to execute. In particular, the rules checking portion (Checker) takes about an order of magnitude less time than the gate level description (<1 minute vs 9 minutes). This is useful when fixing the few rule violations that can not be automatically fixed by the Checker.
Physical synthesis and optimization tools from Magma were run on a 32-bit, single processor, 2.5GHz Linux machine. The breakdown of the time required for the various sub-steps is as shown in Table 1.
The additional steps required to modify the netlist (i.e. insert scan chains and test points) and subsequently adjust the timing due to modifications is relatively small -- compared to the rest of the flow. Also, there was no iteration required despite the presence of 522 test points sprinkled throughout the design.
Step |
Time hh:mm:ss |
Load libraries, load RTL, fix RTL |
00:07:03 |
Fix netlist |
00:16:40 |
Load SDC, fix time |
00:29:00 |
Scan (rules checking, scan + TP insertion) |
00:11:47 |
Reconcile netlist, load scandef, reload SDC, trace chains) |
00:02:54 |
Lite fix time |
00:14:43 |
Fix cell |
02:24:14 |
Remaining physical steps (including DRC/LVS) |
08:00:00+ |
Table 1. Execution time
Conclusions
This test insertion flow's capabilities include:
- All testability rules are checked at the RT level for fast turn-around.
- Circuit information is automatically extracted to drive the generation of all test logic based on user-specified test objectives.
- A small number of test-specific timing constraints based on functional clocking are automatically generated and merged with the functional constraints so that all constraints can be satisfied simultaneously as part of the normal design flow.
- At- speed testing of circuit components is enabled.
- Test circuitry is merged into existing circuit modules without changing the design hierarchy level of any functional module.
- The footprint of hierarchical blocks is preserved by anticipating all test ports required during the RTL analysis.
In addition, the flow's flexibility accommodates various scenarios (i.e. test insertion at RT vs. gate level, top-down vs. bottom-up). The test-specific steps of the flow are shown to have little impact on the overall design time.
References
M. Tegethoff, "Unified methodology enables full- chip test", EE Times, February 28th, 2005
Mentor Graphics, "Design flows using TestKompress", Design-for-Test white paper, September 2, 2005
DeFacto, "Why haven't EDA vendors given us DFT at the RT level?", SOC Central, may 1st, 2005
Syntest-Cadence, "One pass DFT and synthesis solution for ASICs", datasheet, 2005
Synopsys, "Achieving DFT closure - the next step in Design for Test", 2001 white paper.
B. Nadeau-Dostie, "Design for at-speed test, diagnosis and measurement", Kluwer Academic Press, 2000
G. Aldrich, B. Cory, "Improving test quality and reducing escapes", Fabless Forum, vol. 10 no. 1,
B. Nadeau-Dostie et al, "Structural test with functional characteristics", Current and Defect- Based Testing Workshop, Palm Springs CA , May 2005, pp. 57-60
S. Pateras, "Achieving at-speed structural test", Design & Test of Computers, September-October 2003, pp. 26-33.
B.H. Seiss, P. Trouborst M. Schulz, "Test point insertion for scan-based BIST", Proceedings of the 1991 European Test Conference (ETC91), pp. 253- 262.
IEEE 1500, "Standard Testability Method for Embedded Core-based Integrated Circuits", 2005

