EDACafe Editorial Roberto Frazzoli
Roberto Frazzoli is a contributing editor to EDACafe. His interests as a technology journalist focus on the semiconductor ecosystem in all its aspects. Roberto started covering electronics in 1987. His weekly contribution to EDACafe started in early 2019. Special report: EDA requirements in the design of AI accelerator chipsApril 2nd, 2021 by Roberto Frazzoli
Innovative architectures, high performance targets, competitive market: does this AI cocktail call for specially optimized EDA solutions? We asked Prith Banerjee (Ansys), Paul Cunningham (Cadence), Mike Demler (The Linley Group), Jitu Khare (SimpleMachines), Poly Palamuttam (SimpleMachines), Anoop Saha (Siemens EDA) Never before were silicon startups as numerous as they are today, in this era of ‘silicon Renaissance’ driven by an insatiable hunger for neural network acceleration. Startups engaged in the development of AI accelerator chips are raising considerable venture capital funding – and attracting a lot of attention from the media, as technology champions at the forefront of innovation. Not surprisingly, most EDA vendors have updated their marketing messaging to emphasize product offerings specifically tailored to the design needs of these devices, and AI startups seem to enjoy a privileged status among EDA customers in terms of coverage from vendors’ blogs and press releases. It is therefore interesting trying to figure out if AI accelerator chips really pose special design challenges calling for specially optimized EDA solutions. AI chips: different or normal? Apart from some notable exceptions – such as the devices based on analog processing, or the wafer-scale chip from Cerebras – it seems fair to assume that the vast majority of the AI accelerators being developed are digital and have a ‘normal’ die size. Is there anything special in these chips that makes them different from other complex processors from an EDA standpoint? “The short answer is no,” says Paul Cunningham, Corporate Vice President and General Manager at Cadence. “I don’t think there is anything really fundamental that makes an AI chip different from other kinds of chips. But an AI chip is usually a very big chip and it’s highly replicated. So you have a basic building block, some kind of floating point MAC, and it’s replicated thousands, tens of thousands, hundreds of thousands of times. The nature of the design will stress the scalability of EDA tools to handle high replication. So in this sense, yes, it is important to make sure that our EDA tools have good performance on this style of design, but if there was another type of design which was also highly replicated, it would stress the tools in the same way.”
“It’s not like low power,” Cunningham continues. “With low power design, you really need to create new features, new tools, a new flow, new languages, new standards. So low power design really needs a vertical solution; an AI chip is not like that. Do we need really special features in our tools, new languages, new methods [to handle AI chips]? No, that’s not the case.” Furthermore, not all these devices are big and complex. “The AI chips aren’t necessarily the most complicated things we see architecturally,” says Mike Demler, Senior Analyst at The Linley Group. “In some cases, they are nothing but large arrays [of processing elements]. Really the more complicated chips are still the SoCs, and especially the chips that are being designed for ADAS and autonomous vehicles where you need to not just include the AI accelerators, but also CPUs and other functions.” Benefits of tile-based architectures A tile-based architecture could actually be considered an advantage from an EDA standpoint. “From the back-end perspective, one of the good things about AI chips is that they are extremely structured and very hierarchical, so once you build a basic building block – which is your compute tile – it generally gets replicated multiple times,” says Jitu Khare, VP Silicon & Systems Engineering at SimpleMachines (San Jose, CA), who took care of the back-end part of the design process in the development of the company’s first chip, called Mozart. “In our case,” he continues, “we essentially had one layer of hierarchy which replicated the block sixteen times, and then that was replicated four times, so the tiling really helped us. I think the extreme hierarchy that you get is probably unique to [AI chips], because if you take a regular SoC, you will not have this kind of hierarchies. And [in AI chips] you can take advantage of that by building your chip hierarchically.” More benefits from a tile-based architecture can be envisioned on a longer term considering the increasing use of machine learning in EDA tools (see our special report on this theme). Prith Banerjee, Chief Technical Officer at Ansys, points out that the replication of a single tile is the ideal situation for leveraging neural network-based EDA tools: “If you have a normal chip which has all kinds of random transistors there is absolutely no pattern to it. Whereas, if you have a RAM chip, it is very regular. There are things that we can do when you have regularity in a chip that you cannot do in an irregular chip design. A general purpose processor that has all kinds of control logic – instruction decoders, adders, multipliers – is very irregular, so there’s not a whole lot of AI that you can use. But the more regular the chip is, the faster is the design process.” Banerjee also highlights the intriguing concept of AI-based EDA tools helping the design of more powerful AI chips, which in turn will enable more powerful AI-based EDA tools. A positive feedback loop, paving the way to further advancements. Optimizing the chip architecture: HLS vs in-house tools A step in which the EDA requirements posed by AI accelerator chips may differ from other devices’ is the architecture optimization taking place at the very beginning of the design process. The difference could consist in a stronger need for high-level synthesis, helping designers in making several important choices at this point of the flow. However, opinions on this are not unanimous. According to Anoop Saha, Head of Strategy and Growth for the Catapult platform at Siemens EDA, architectural decisions pose special challenges to the AI startups: “They need to do things differently. They are building a chip that is optimized for specific tasks, specific workloads. There are quantization requirements, and you don’t know what the right quantization model would be.” Similar decisions must be made between floating point and integer operations. “We are seeing customers who have changed from 8 bits to 10 bits, or from 10 bits to 16 bits. You need a more agile flow for developing these chips,” he continues. With HLS, Saha maintains, designers can explore the design space early on, make small changes and see the results in terms of PPA metrics. Poly Palamuttam, VP Hardware at SimpleMachines – who took care of the front-end part of the design process in the development of the Mozart chip – explains the factors and considerations behind these architectural choices: “To a large extent, the decision of whether we want to support more than just integer data types is driven by market needs and requirements and customer feedback,”, he says. In other words, ROI should be taken into account, as “you have to pay the penalty in terms of area to support floating point, for example.” According to Palamuttam, however, these considerations do not necessarily lead to the adoption of HLS tools for performing what-if analyses at this stage. In fact, SimpleMachines used a model developed in-house, based on C, C++ and Excel. They also used Chisel. “We didn’t actually play around with high-level synthesis,”, Palamuttam says, “but I do want to add that both myself and Jitu actually came in slightly after the initial decision was made, so I can’t comment too much on that. But if I had my way and If I started from scratch, I’m not so sure that high-level synthesis would have helped me with making any of the decisions that we have made. We didn’t use any external tools, and my gut feeling would be I would continue to use internal tools to do that. Part of the reason is, I don’t think we have the luxury of time to go and explore these tools from vendors. But maybe the next time around, I’ll probably survey what’s out there and see if any of [the HLS tools] would offer us any advantages.” Reliability of pre-silicon performance predictions Continuing to explore the aspects on which AI accelerator chips may have special EDA requirements, an important point is the need for early access to some sort of hardware model – long before silicon is available – to test the architecture in terms of performance. “What we tend to see quite often is because these startups are competing in a very hot space, their initial announcements may be based on simulations, or they may use prototyping platforms, developer software, hardware emulators,” says Demler. “That’s really the key for a lot of the startups to test out their architecture with the limited funding a lot of them have, before they actually dedicate the investment to going for silicon.” That’s where the reliability of the performance predictions based on EDA tools becomes critically important. “We’ve seen some AI startups succeed and some fail,” Demler observes. “If they’re depending on the tools to demonstrate their architecture – not just to themselves, but to potential customers – then the simulation results are extremely valuable and need to give them a good degree of confidence that they will be confirmed in the final product. Some AI startups have been developing their chip for five years or more, and still don’t have a real silicon product. Reasons may include manufacturing challenges, wrong design, or inaccurate indication of performance based on a simulation.” Hardware-software co-design The other reason for early access to some sort of a pre-silicon model of the chip is the need for hardware-software co-design. “Software continues to be the biggest challenge [for AI chips],” says Demler. “It’s great that you can build [so many] processing tiles on a die, but then the challenge is how do you program them. And we see a lot of startups stumble there. Their hardware specification may look impressive because of the theoretical peak functionality, but you never get 100% utilization; in a lot of cases you get less than 50%, even much less than that, in terms of real throughput. This is why the smart startups do as much of hardware-software co-design from the beginning as possible.” Hardware-software co-design was certainly important for SimpleMachines, whose solution is based on two complementary elements: a chip architecture and a special compiler. “Software is a huge piece of the total solution, so the faster you can get your software team access to your actual hardware, the better off it is,” says Palamuttam. “That’s why, very early on in the design cycle, we decided to use both a hardware emulation and an FPGA platform. It was an off-the shelf FPGA platform to which we ported part of our design, and we gave that to our software team so they could start to build the software applications and infrastructure.” TOPS/Watt requirements So far, we have reviewed some of the aspects on which one could expect AI accelerator chips to pose special challenges to the EDA tools. But how do these aspects rank in the EDA wishlist compiled by AI startups? As for SimpleMachines, gradually moving toward the top EDA priorities we find the ability to handle power optimization. In fact, while power efficiency is a common requirement for many different categories of chips, the TOPS/Watt performance is undoubtedly a key metric for AI acceleration devices. “Power efficiency, our TOPS/Watt, was a big criterion for us to differentiate from various AI accelerators,” says Palamuttam. “Trying to get a handle on what our power was going to be, early on, was one of the things that we had to pay a lot of attention to – using whatever RTL level power analysis and also trying to get realistic vectors from the Veloce emulator from actually running an application.” As we will see shortly, SimpleMachines has chosen Siemens EDA for front-end. Power efficiency was a key concern for SimpleMachines also from a back-end standpoint: “Floorplanning the chip with power in mind is extremely important,” says Khare. Practical considerations: toolchain integration, EDA services, proven IP While different AI startups may have different needs, it’s interesting to notice that the top criteria mentioned by SimpleMachines to explain their EDA choices are essentially practical considerations that could potentially apply to most ‘normal’ chips. “Mainly, my interest was in making sure that we have a set of tools that can integrate well without having to take one tool from one vendor and another tool from another vendor,” says Palamuttam. “It becomes very cumbersome to do this mix and match business, so from that standpoint, I wanted to pick a vendor that could provide us tools that could cover the entire gamut of everything that I was responsible for, which entails RTL verification, validation and emulation. We did look at multiple vendors, and one of the vendors that actually fit that criteria of mine was Siemens EDA.” “From a timeline perspective” – Palamuttam continues – “there was one more criterion that I looked at, which was that Siemens EDA had their services business, and since our team was very small, I wanted to leverage their services arm.” The choice of a back-end EDA platform was based on practical considerations, too: “One of the main criteria for us was the selection of physical IP”, says Khare, referring to IP blocks such as the PCI and HBM interfaces. “And based on looking at different vendors, the best option we could come up with was Synopsys in the 16 nanometer technology. To de-risk our chip, we decided to go with these silicon proven IPs. Having made those choices, it kind of made sense for us to look at bundling tools and IP because that was going to give us the best bang for the buck.” SimpleMachines picked several Synopsys tools including place & route, extraction and timing. For physical verification, however, they chose Calibre from Siemens EDA, considering it the golden signoff tool. A unique combination of requirements In conclusion, do AI accelerator chips pose unique design challenges that require specially optimized EDA tools? It is probably fair to say that the uniqueness of this specific category of devices lies in the combination of several different requirements, including some basic practical considerations – concerning development time, cost, risk – that rise to a high priority level due to the specific characteristics of the AI market and its players. By definition, AI startups are small companies, in many cases with limited resources; still, they play in one of the most competitive markets around, where they are supposed to develop innovative, sophisticated products – often taking on Nvidia – within narrow time windows. While this scenario probably does not call for specially optimized EDA tools, it certainly requires a smart, comprehensive EDA approach. |