EDACafe Editorial Roberto Frazzoli
Roberto Frazzoli is a contributing editor to EDACafe. His interests as a technology journalist focus on the semiconductor ecosystem in all its aspects. Roberto started covering electronics in 1987. His weekly contribution to EDACafe started in early 2019. Machine Learning-Based Cerebrus for Intelligent Chip DesignAugust 6th, 2021 by Roberto Frazzoli
The new Cerebrus Intelligent Chip Explorer recently announced by Cadence is a machine learning-based tool that automates and scales digital chip design, in combination with the Cadence RTL-to-signoff flow. It promises the ability to improve engineering productivity by up to 10X versus a manual approach while also realizing up to a 20% better power, performance and area (PPA) metrics. Rod Metcalfe, Product Management Group Director at Cadence Design Systems, has described the key features of Cerebrus in the video interview he has recently given to EDACafe’s Sanjay Gangal; in addition to that, we have asked Rod a few more questions on some specific aspects of the tool.
Key ingredients: reinforcement learning, distributed computing As Metacalfe explained in the video interview, Cadence thinks Cerebrus will represents the future of digital chip design. “First of all – he said – we’ve developed a unique reinforcement machine learning engine that really helps optimize the full flow of a digital design. This will allow chip designers to get better PPA more quickly, so it’s going to improve the productivity of the design teams. Now, this is an automated RTL to GDS full flow optimization, and it’s based on some distributed computing technology. It can either be on-premises compute or it can be cloud resources, but the idea is really that Cerebrus is very scalable. It can adapt to the bigger designs that design teams are doing today.”
Metcalfe then underlined the difficulties of advanced chip design, and the benefits introduced by machine learning: “Today [chip design] is a very sort of manual iterative process. (…) We take a design, we take some kind of existing flow, some new RTL from a new project, and then the chip design team will start running the flow, generating some results, looking at the results, and deciding sort of what to change next about the flow so they can meet their PPA goals. This is a pretty engineering-intensive task. It takes a lot of compute power too, and it generates a lot of data. This manual, sort of iterative approach, is really becoming a problem for design teams because they can’t get through the designs quick enough. The machine learning approach takes the data that normally would be analyzed by the engineering team and applies machine learning analysis to that. This is where the reinforcement machine learning engine comes in. It’s able to use the design data, start modifying the flow, looking at the results, quickly converge on a better flow.” Another enabling factor is Cerebrus‘ capability of using massively distributed computing: “The amount of compute power that engineers have access to is hugely more than a few years ago,” Metcalfe pointed out. He then highlighted the benefits of the reinforcement learning approach: “We don’t have to run the flow every time for every single scenario, we can run enough of the flow to learn from. Reinforcement machine learning is very good at that because it’s able to sample data as it goes, so it’s learning continually, and based on that sample data, it knows if something’s getting better or something’s getting worse. It’s a reward-based system: if something’s getting closer to the goals, that’s great, we continue in that direction. If it’s deviating away from the goals, then we don’t wanna continue with that scenario, we want to go and try another one. The flow can be stopped at any point depending on the convergence but we don’t require that of the user, that is completely automated through Cerebrus. (…) We really do propose an RTL to GDS optimization because what happens in the synthesis can easily affect what happens later on during implementation.” Another aspect hightlighed by Metcalfe is that Cerebrus does not force users to change their flow: “No change in methodology from tests or power vector generation, that can all be done through the verification tools as people do today,” he said. Moving to another side of Cerebrus optimization, Metcalfe discussed the tool’s floorplanning capabilities: “Cerebrus has the ability to do automated floorplan exploration,” he said. “We may take an initial floorplan from the customer, and then we will start exploring different floorplan configurations completely automatically.” Design flow optimization: not only parameter setting Answering some more questions complementing the video interview, Metcalfe clarified that design flow optimization definitely involves setting the right parameters on individual EDA tools, but it also includes more aspects. “Setting parameters is certainly part of it, and that is something Cerebrus does,” he said. “It will take all the different parts of the flow and optimize the tool parameters to ensure we get the best results. But over and above that, there’s also what we call system level optimization capabilities that Cerebrus has. And here we’re able to do much higher level optimization. So, for example, Cerebrus has the ability to optimize a floorplan. So this is no longer just tool options, now we’re really optimizing the design itself.” The importance of floorplanning One could say that in this view floorplanning has a special status, requiring special capabilities compared to previous tools. “The existing Cadence tool has the ability to do the mix placement where we concurrently place macros and standard cells together – Metcalfe reminded – and that allows you to do the optimization of the macros within the floorplan. But what Cerebrus allows us to do is also dynamically change the size and the shape of the floorplan and then reposition the macros inside this newly shaped floorplan. And, of course, we are tuning the flow based on this new floorplan.” The reason for including floorplanning exploration in Cerebrus is its influence on the whole design flow. “What we do in the floorplanning will affect synthesis,” Metcalfe pointed out. “The floorplan that is optimized as part of Cerebrus goes all the way back to Genus as part of the physical synthesis flow. So the value that Cerebrus can offer is allowing this floorplan to be used across the whole flow. It’s not purely just the place and route that’s using the floorplan anymore. So the floorplan itself will influence synthesis as well as the final place and route. (…) I think the days where the floorplan was restricted to just a place and route phase is over. The floorplan really does have to be part of the full flow, right from initial synthesis. And that’s what Cerebrus allows us to do.” ML-inside or ML-outside? One could argue that floorplanning exploration in Cerebrus challenges the distinction drawn by Cadence between ‘ML inside’ and ‘ML outside’, where ML-inside is described as a function ‘under-the-hood’ improving the results from one part of the flow, while ML-outside wraps around the tools to accelerate the entire design flow. In principle, ML-based floorplanning exploration could have been added ‘under the hood’ to an existing tool, replacing some older type of heuristics. “We still sort of view Cerebrus as a ML-outside kind of technology. But you are right. It’s in the gray area where it goes between both, because the floorplan clearly is created as part of implementation, but it can be used also as part of synthesis,” Metcalfe conceded. Efficient use of computing resources The Cerebrus announcement stresses the tool’s capability of using distributed computing and cloud computing resources. From this, one could infer that Cerebrus needs some extra computing power to unleash the benefits of machine learning. However, this is not necessarily the case, according to Metcalfe. “That’s not generally what we see, actually,” he said. “What people are doing today when they do sort of manual flow optimization is they manually start a bunch of different runs, and then they look at the results and they decide which run is better. (…) So a lot of chip design teams are already using a lot of compute resources for these flow optimization experiments, but they’re doing it manually, and that’s pretty inefficient on the whole. (…) There may be some additional compute resource required using Cerebrus, but it’s not that dramatic.” According to Cadence, machine learning is actually improving the efficiency of compute resources utilization. “Cerebrus is going to make much more efficient use of those compute resources,” Metcalfe said. “Because one of the nice things about reinforcement machine learning is that it learns as the data is generated. We’re continually sampling the data as the full flow runs progress, and if a scenario is not looking encouraging, we will stop that scenario and we will start a new one. (…). If you did it manually, you tend to let the whole flow run, and you only look at the data at the end, and you may have wasted many hours of compute time generating results that were not useful.” Fault tolerance In Cadence’s view, Cerebrus’ capability of taking advantage of distributed computing and cloud computing resources should rather be considered from the point of view of the overall design flow management. “The reason we’ve emphasized distributed computing in terms of Cerebrus – Metcalfe explained – is [because] it’s a completely automated system. So it’s fault tolerant: if a job stops for any reason, it’s automatically restarted. (…) So it’s a much more sophisticated solution than designers just running their chip design in the cloud.” Management functions include the prediction of cloud resources needed for a specific design, Metcalfe pointed out. “The design teams have complete control over that. They specify how much compute resource they want to use, what type of machines, and then Cerebrus will manage those compute resources to make as efficient use of them as possible,” he said. Process-specific flow optimization Reinforcement learning enables Cerebrus to perform flow optimization without any previous training. The tool, however, does learn from its ‘experience’, so that it does not have to start from scratch when approaching a new design. “What Cerebrus has the ability to do is to create a machine learning model that captures the learning up to that point. That machine learning model would certainly be applicable to a similar design on the same process node, because what it learns is certainly going to be process node-specific, but that model itself can be reused across different designs. So if you’re doing different blocks in a system on chip, once you have the machine learning model generated by Cerebrus, you can then use that as a starting point for other blocks on that system on chip. And that can really help reduce the training time. By capturing the previous learnings from the other blocks in the design, we’re able to converge on a more efficient flow, much more quickly on other blocks from the design.” An important aspect here is that flow optimization is process-specific: for example, the best parameter settings for a given process node from a given foundry will arguably be different from the best parameter settings for another process node from another foundry. So, users designing chips collectively targeted at multiple manufacturing processes will arguably need to use Cerebrus in such a way that the ‘experience’ gained on a certain process will only be used with new designs targeted at the same process. “Yes. I think something like that would be reasonable,” Metcalfe conceded. “You shouldn’t take a sort of 28-nanometer machine learning model and try to apply it to a 7-nanometer design. That’s not going to work, but similar 7-nanometer designs would definitely benefit from the reinforcement machine learning model that Cerebrus generates along the way.” A design management cockpit Cerebrus can also be considered as a cockpit from where designers can monitor and manage the whole design flow. “Cerebrus also has a pretty sophisticated graphical interface – Metcalfe pointed out – so you can look at the whole design across the whole flow. You can look at the underlying data that generates the machine learning behavior. So, you could consider it as a complete chip design manager, as well as the flow optimization. Users certainly don’t have to use the GUI, it is completely automated, but we find that many customers do want to interact with the system, and that’s why we provide some graphical interface and some data analytics capabilities. And you get to see the whole flow that Cerebrus is optimizing.” Future developments Besides parameter setting and floorplanning exploration, there is more to flow optimization according to Cadence. This is why the company is already planning for future releases of Cerebrus equipped with additional features. “There’ll be lots of other things coming with the Cerebrus technology in the future. (…) We’re working on various other types of optimizations. We’re not really ready to talk about them publicly yet, but there is a lot of potential in this technology that can be taken in different ways. Cerebrus has the underlying infrastructure to be extended in many different ways, and we have already some good ideas in what the next generation of Cerebrus will be,” Metcalfe concluded. Resources To know more about Cerebrus, readers can visit this page on the Cadence website, providing access to various resources: videos, a white paper, blog posts. Categories: EDACafe Editorial, Video Interview |