:: Strategically Speaking
When "good" design practice makes bad chips
Recently, the Yield Learning Report interviewed John Gallagher, Senior Director of ASIC Synthesis Marketing at Synplicity. The conversation focused on how 90nm and 65nm process technology has impacted pre-existing design methodologies and practices as well as how to best address the problems that have arisen.What new yield-related problems are we seeing with ASICs at the 90nm node?
We've still have all the challenges we've had in the past but now we are seeing problems that really grow out of design methodology such as guardbanding.
|
The bottom line is the more components or instances on the device the worse yield effects are going to become. At the backend, you have more instances that have to be routed and connected together. Congestion becomes worse and congested regions on the device tend to yield more poorly.
What about yield problems that arise at the boundaries between IP?
Everyone has his favorite hobby horse to ride on these issues and mine is power grid effects. From a yield perspective, power grid problems can cause issues like in-field device failure (through rail collapse), as well as uncertainty on what speed the device can run at--if you assumed 1.8 V along the net, for example, and it really is getting 1.6 V, you may see a significant timing impact.
A simple example would be that you have a design laid out that meets timing and maybe you have a power grid for it. Well, for whatever reason, you want to pop in a new core. All bets are off. You don't know if your power grid is going to support it, both from reliability and timing perspectives.
If you pop in a processor core that runs 100 MHz faster, you'll see serious problems. Devices typically draw power from the pad ring around the outside. But with the processor core is smack in the center of the device that distribution of the power from the periphery will be quite different than you originally assumed.
It will affect more circuitry as it travels through that device to get to the core. Let's say that processor kicks on at 300 MHz and lots of other logic along that path fires up. That causes rail collapse--the power to run the device just isn't available.
Even you aren't causing device failure you are definitely affecting timing on that path because your power grid has been modeled in a certain way and all of the timing of everything connected to the power network is in some form going to change.
Unless you are doing really heavy analysis with awfully expensive tools to find and correct for these problems you are taking risks.
What effect are these problems having on the design community--not to mention the industry?
A fundamental skill set and a knowledge base is needed for 90nm and beyond that didn't exist in the ASIC designer community three years ago. So how realistic is it that every ASIC design team that existed three years ago is going to be able to come up to speed and aggressively resolve these problems in their design flow? Not many. And that is where I think 90nm really causes a serious perturbation.
We have very clear data showing that over the past couple of years people have not migrated to the next deep submicron technology as aggressively as they have before. The time it took to go from 0.5?m to 0.35?m or from 0.35?m to 0.25?m was much shorter. A lot of companies have held back to going to 90nn.simply because these effects are awfully hard to deal with.
We've talked about design methodology and yield. How about the process technology itself?
Best example is electrons firing into the oxide layers and historically you had enough thickness to withstand that. It wouldn't cause failures in 100 years. Now we're talking about oxide layers of a couple of atom spaces. So we have hot electron effects where those electrons break down the oxide layer.
Power grids have a role to play here as well. One of the technologies we invested in is power grid analysis to be sure you aren't allowing too many electrons flow through too small an area over a period of time. That would be a defect--a failure in time. This breakdown process is greatly accelerated at 90nm.
Quite an array of problems. How can you handle them?
You start early in the design process. Today most of these problems are dealt with in place-and-route. You almost always create your power grid and define most of your clock network in place-and-route. The problem is that you need to estimate the impact of the power and clock networks earlier in the design flow (like in floorplanning), and you will probably be more optimistic about these effects than you should be. And of course your synthesis tool doesn't know about these effects so it doesn't deal with them. Once you get to the backend the results can be quite disappointing because they are too far from reality and often require a lot of long iterations back to synthesis and floorplanning to solve.
We have tools now that can avoid this. You start with a floorplan that defines where the power grid will be. You perform physical synthesis so you know about placement with respect to the power grid. With this information the tools might know something about the power that will be delivered to the cell. It could correct on the fly what the real timing is.
So rather than designer employing guardbanding, the tool automatically handles the problem better than the designer can. One distinct trend is more detailed physical information is moving up in the design flow. It gives better accuracy and consistency as you go through the flow.
We're strong proponents in the belief that RTL matters. How it is coded is the most meaningful part of improving design quality that designers have at their fingertips. And there is more physical information available to the front-end designer to help them generate physically-correct RTL code.
How do you get the RTL right?
We are at the beginning of the age of ESL (Electronic System Level) design, which means a lot more design work will take place before RTL. Gary Smith (at Dataquest) makes that prediction every year at DAC. I think it is beginning to happen. Preventing of yielding effects is playing a big role in driving ESL.
Can you make ESL physically aware?
That is being worked on; however there isn't that capability today. My guess is that in the next few years we'll start seeing timing aware and physically aware ESL tools that C-based designers can use.
What's Simplicity's view on the best solutions available now?
Structured ASICs. As I've mentioned before, the problem with cell-based ASICs is that you use a lot of guesses. The beauty of the structured ASIC is that with most of the masks already done the designer doesn't have to guess. It's there and it's correct and it's accurate.
That's why we believe that structured ASICs are a very elegant solution to the problem and they don't take that much away from the designer. And you can go from RTL to parts in hand in 6 months for a multi-million gate 90nm ASIC.
For example?
Because the power grids are almost always embedded into the base metallization you know exactly what power grid is going into synthesis. You know what your maximum frequency will be. You know how your floorplans can be best aligned to that power grid.
Do structured ASICs have advantages for test as well?
They make aggressive use of built-in self test. Because it is build into the base layers--at least the scan chain is built in--it becomes another form of known routing.
A designer who distributes the scan chain poorly is putting in more routing, more congestion, more hot electron migration. That is going to decrease yield.
How quickly will the industry solve its yield problems?
That is a hard question to answer because 90 nm yield problems have not been seen in previous geometries. They need new solutions that often that are still in development.
Because few design teams can afford a additional million dollars or more in tool costs, or an extra six or eight months of risk in their design flow, these new solutions will be slow in migrating to mainstream design teams.
In the short-term I think alternatives to cell-based ASICs will be the way most design teams will go to avoid these problems. Specifically, Structured/Platform ASICs, or high-end FPGAs.
These are the only silicon choices that don't require a massive investment in money and time. In the long term, new yield issues will ideally be dealt with automatically within the design tools; that way you don't need to train and educate a new type of design engineer and you can keep tool costs low.
However, the potential is there for design teams of the future to have a "yield team" the way that there are separate verification teams or physical design teams.

