Agnisys Automation Review Louie De Luna
Louie is the Director of Marketing and Sales at Agnisys, Inc. He has over 15 years of experience in FPGAs, ASICs and EDA industries. Louie was most recently the Director of Marketing for Aldec where he was instrumental in the development and execution of the strategy for the entire functional … More » Repurposing von Neumann Architecture with SRAM-based Register FilesAugust 16th, 2019 by Louie De Luna
By Louie De Luna, Agnisys Chief Product Evangelist The conventional von Neumann architecture has been the workhorse of computing for several decades, but with the advent of AI applications and big data the entire industry has put a spotlight on its limitations. Since massive amounts of data need to travel back and forth between the CPU and memory, the resulting latency and power consumption became major issues. One of the powerful convolutional neural networks (CNN), Alexnet, requires 68M total weights (parameters) and 724M total MACs for a single inference process – a mere average requirement compared to other CNNs such as VGGNet which requires 138M total weights and 15.5G total MACs. New chip architectures and technologies are now emerging to address these issues known as the “von Neumann bottleneck” or the “memory wall” problem. The Google TPU is based on systolic arrays that provides up to 420 Teraflops, the Graphcore IPU is based on Bulk Synchronous Parallel (BSP) technology that provides up to 125 Teraflops and IBM Zurich Lab is working on a new AI chip based on in-memory computing. But as the world of computing and AI wait for the new chip architectures to mature, the memory wall problem is still a real pain. Startups without the backing of deep pockets will need to come up with other ingenious ways in order to be competitive.
A popular strategy used for datacenter acceleration is to repurpose existing von Neumann architectures by implementing some of the register files as on-chip SRAM as opposed to standard flip-flop cells. This will free up a considerable area on the die that can be used to implement more functional blocks for acceleration. There are two base categories of register files in today’s SoCs: dynamic and static registers.
Benefits of SRAM-based Register Implementation Implementing register files as SRAM as opposed to standard flip-flops provide the following significant benefits:
Sample Implementation in IDesignSpec™ Implementing registers using SRAM is not an easy task as the creation of the structure and interface requires a lot of automation. The registers implemented as SRAM need to also connect with the registers implemented as flip-flops. An example of how this can be done in IDesignSpec is described below with several components shown in Figure 1.
Figure 1: Implementation in IDesignSpec using SRAM Wrapper A more specific example of a register file structure is shown in Figure 2 which consists of:
Figure 2: Register File Example Structure Figure 3 shows how the Dynamic and Static Registers can be defined in IDesignSpec using Excel Editor. Using properties {EXTERNAL=true; repeat=512} in the description column repeats the Static section 512 times and makes “Static_Section_0, Static_Section_1, … Static_Section_511” as address placeholders only where no physical registers are implemented inside the IDesignSpec generated block. Only ports are generated by IDesignSpec. Figure 3: Register File Example Definition in IDesignSpec Excel Once the specification is completed using IDesignSpec Excel Editor then the user simply needs to select the desired system bus (AMBA-APB, AHB, AXI, TileLink or Proprietary) and generate the required output code including RTL, UVM register model, C/C++ Headers, Python and documentation. All outputs are derived from the golden specification, a popular methodology employed by our customers to synchronize various SoC teams. Startups in AI can easily implement SRAM-based register files. It does not completely solve the von Neumann bottleneck but it’s an inexpensive counter-measure. As the race for AI supremacy continues, new-generation chips will be deployed, new research will be conducted and new requirements will arise. As a result new bottlenecks will naturally show up so favoring the least expensive solution to a given problem is a practical choice. Time will tell who will dominate in this race, and I’m for sure excited to see! If you’re in the Bay Area this month, see us at Hot Chips Symposium at Stanford Memorial Auditorium in Palo Alto, California on August 19-20, 2019. We can discuss more details about the SRAM-based register implementation. Tags: AI, hotchips, IDesignSpec, SoC, Verification, vonNeumann |