With this array of hardware/software RET approaches, how does manufacturing ensure that what ends up on the wafer is what the designers intended?
Good question. Going back to 130nm and maybe a little bit of 90nm that was a pretty big unknown because we were doing things on the hardware or on the mask side and on the scanner side but we were, I wouldn’t say flying blind, but they were rudimentary techniques that were like design rule checks, originally and collectively called ORC for optical rule check. Our software and other software enable simple and crude checks of what the edge placement error is, the anticipated location of a printing of a feature versus what the target was. There were like DRC decks. They were pretty simple and crude. They would check some very basic things. Since about 3 or 4 years ago, the industry has evolved. We have product and competitors have products in this space, complete simulation based contour generation, meaning that now we can not do not just isolated locations on the design where we double check what the printing is on the wafer versus the target, we can now do an entire contour and compare that to the target. We can get much more sophisticated and thorough in the way we can interrogate the intersection of design and process variability. For instance, we can anticipate if process variables in the fab like dose or exposure change, how that anticipated contour compares to the design and we can highlight hot spots or potential areas where the design might be weakened. In short these full chip contour simulation checks have evolved and are available to enable exactly that, a checking of the design target versus prediction on wafer image. In fact when you go and compare the actual wafer images with the predicted images, it is pretty remarkable how consistent these two are with one another.
What is the design flow? If a design goes to these RET techniques, does it ever go back? Is there any feedback loop?
There are feedback loops on two levels. The furthest upstream feedback loop in design is something we call litho-friendly design. The same engine that can do full chip contour generation can be used early on in the process maybe when basic standard libraries are being developed after initial ground rules. Designers can have access to the sort of blackbody representation of the anticipated manufacturing model and they can see at their desktops literally, if they are working with some design library, what the anticipated wafer print will be. They can look at anticipated hotspots and make changes to the polygons so that they can enhance the process window. By doing this, the designers do not have to be litho experts. They can have, almost like standard DRC checks, LSD checks that will tell them whether the design may not be manufacturable very early in the process. Designers can make manipulations with the awareness of that anticipated process.
The other feedback loop is at the time of manufacturing when a design tape outs. Assuming it is design rule clean and the libraries have already gone through LSD, you would anticipate very few problems for the fully assembled chip. After DRC, the OPC is applied and then this full chip contour generation is done. If there are any hotspots, then the OPC team and the design team can look at what happened and see if there is something such that we have to redo the OPC model, the OPC recipe. This is another opportunity before committing to generating masks to correct those problems.
How much computational horsepower is required to carry out the RET techniques? A laptop, a compute farm, …?
The LSD application, while it can be a full chip, typically can be done with a desktop system, a designer has access to with 4, 8, 16 CPUS, which is pretty standard to do. It is pretty standard to use Linux farms for low cost computing. For the manufacturing tape out in production quite a bit more hardware is typically used. The standard industry benchmark is to be able to apply OPC to a given layer and do it overnight or within 24 hours. That is the typical turnaround expectation. The compute complexity for model based OPC and post OPC simulation of contours to verify those requirements have been going up and up with each generation. For the first model-based OPC generation at 130nm or 90nm, typically the number of CPUs was 8. If you look at 32nm now, which is just starting to go into production, the number of CPUs to maintain 24 hour turnaround time can approach 1,000 or more. In order to contain the cost and the runtime, which impact time-to-market, we have developed new technology that goes beyond just farming out the compute to conventional Linux-based computers, albeit 500 or 1,000 to sending specific portions of the compute to highly efficient cell processors. We had a press release earlier in the year. This was another project where we worked with IBM. The complexity of the compute challenge is soaring geometrically. We are trying to contain that in both by software improvements and by judicious use of a sort of hybrid compute platform that has conventional Linux based system and some microprocessors.
You said 24 hours per layer. How long for a full chip?
This cell technology, which we introduced with IBM, was first adopted at 45nm. At 45nm there is typically 25 to 30 layers that would use model-based OPC. Most designs now in logic will have 7 or 8 layers in metal getting OPC. It is quite a few. At 32nm there are over 40 layers that require model-based OPC. Many of those layers, especially with the cell acceleration, can run in a matter of a couple of hours. One, two, three or four hours. Typically metal 1 and poly are the most computation intense layers. Those will push the 24 hour limit in some cases.
If there is a design change, do you have to go start all over again?
There a couple of different kinds of design changes. If there is an engineering change order that comes from the designers, sometimes that cell can be redone, re-OPC’ed and merge back into the overall design. We call this re-OPC and that way you do not have to do the entire chip. The same thing can happen, if a problem is found in manufacturing, when you do this full chip verification. If there is one location or one cell, the OPC on that location could be redone and then merged back into the overall design without having to redo the OPC.
Under what circumstances would one have to redo the entire chip?
That varies quire a bit. We have customer that will do it both ways. In some cases customers, will find for just bookkeeping purpose, since turnaround time in less than 24 hours, they have to keep it clean and if there is any engineering change, they will do the whole OPC.
I am not sure I could give a strict guideline on when the customer should do it one way versus another way. I would say that the majority of the time to date it has been to redo the entire layer, if there is an engineering change.
Mentor recently announced a relationship with IBM. Is that a new relationship or a continuation of an existing relationship?
It is really a continuation of a long relationship we have had with IBM. We have been working with them on model-based OPC going all the way back to 130nm. So it is sort of a logical extension of that relationship. I also mentioned that last year our relationship took a new form with this cell processor that IBM supports. In so many ways this relationship was a continuation of our relationship.
That’s the cell broadband engine (Cell/B.E.)?
Would expand a little bit on that?
We looked at this compute challenge that was growing. In a typical OPC flow the full chip is simulated at multiple locations. I mean billions of locations across the full chip to predict where the printed pattern will be. Then we distort, we break up the layout polygons into much smaller fragments and start moving edges in a manner consistent with the predicted profile in or out according to what the simulation says. Then we go back and re-simulate. This process of iteration is done 4, 5, maybe up to 10 times in order to get the entire chip to converge. So the final predicted wafer image and the target are within some tolerance. That’s where you take billions of simulations and do that maybe 10 times. The thought was let’s find the most efficient compute platform to do that. So we looked at a wide variety of options. We looked at FPGAs and other special GPUs and quickly came to the conclusion that for the type of simulations that are done, which are typically fast-fourier transforms, the cell block was uniquely postured to be the most cost effective compute platform. It is something we worked with IBM and Mercury Computer Systems to port the Calibre simulation engine onto that cell. The result is that our customers can reuse their existing farm of conventional Linux computers that they already have, (typically hundreds or thousands). By adding literally only a few dozen of these cell broadband blades, the customer can mix and match a variety of combinations of existing conventional CPUs plus the cell. That way they can continue to recoup the investment they have made in the Linux systems. We looked at using FPGAs and realized that that would require completely dedicated hardware to OPC job. We did not think that that was the most cost effective solution. Our solution, with the addition of a small number of cell blades, users can still do their DRC, their XRC and their LDEF; all the different things that are needed to support the design and OPC. They can use their existing investment in hardware plus a small incremental investment beyond that. By doing that, we can see that runtimes decrease by a factor of about four in typical configurations versus not using the cell hardware.
Be the first to review this article