EDACafe Editorial Roberto Frazzoli
Roberto Frazzoli is a contributing editor to EDACafe. His interests as a technology journalist focus on the semiconductor ecosystem in all its aspects. Roberto started covering electronics in 1987. His weekly contribution to EDACafe started in early 2019. A quick look at the 2021 Linley Fall Processor ConferenceNovember 18th, 2021 by Roberto Frazzoli
This week EDACafe takes a quick look at the 2021 edition of the Linley Fall Processor Conference, organized by technology analysis firm The Linley Group at a physical venue in Santa Clara, CA, and followed by a virtual event. Besides updates on deep learning accelerators, the conference also covered ‘conventional’ processing solutions and some other types of IP. This article will only provide a general overview of the event; full content can be accessed from the conference website, downloading the proceedings (presentations slides) for free. AI trends: bigger training workloads, segmentation of the edge-AI market In his opening keynote, Linley Group’s Principal Analyst Linley Gwennap reiterated the key concepts from last Spring processor conference, adding updates on the recent AI trends. Among them, the size of NLP models keeps growing: Google’s Switch Transformer has 1.6 trillion parameters. To train ever-larger neural networks, Cerebras and Tesla are using wafer-scale technology and other innovations. In the datacenter, Nvidia is finding tougher competition from Qualcomm AI 100 and forthcoming Intel Ponte Vecchio. According to Gwennap, Nvidia still leads in performance, but not in efficiency. As for edge-AI, this market is fragmenting into high-end chips for camera-based systems and low-power chips for simple sensors. The conference also saw the participation of TechInsights – the Canadian reverse engineering firm that has recently acquired The Linley Group – with a presentation on the performance gap between CPU and main memory. Among other findings, TechInsights analysts concluded that SRAM cell size scaling trend is getting worse than Logic Standard Cell because SRAM cell does not have DTCO (Design Technology Co-Optimization) scaling options.
New AI acceleration architectures The challenges posed by AI acceleration keep generating new and diverse computing architectures. At least four of them were proposed at this edition of the conference. Startup Roviero (San Jose, CA) believes that software is the biggest challenge with edge AI acceleration, as software is key in being able to run any neural network at high utilization (>80%) and low memory usage. According to Roviero, feeding the 3-dimensional neural network data using a traditional instruction set makes the compiler inefficient, therefore the company came up with a new architecture – a “natively graph computing processor for edge inference” – called CortiCore whose instruction set reduces the compiler complexity. Roviero maintains that this approach allows it to achieve >80% utilization with 16x reduced memory (compared to currently available solutions) on all neural networks. Japanese edge-AI startup ArchiTek has develop an architecture called AiOnIC, employing fixed-function accelerators that are quickly reconfigurable for different algorithms. This flexibility allows the replacement of GPUs, DSPs, image processors, and deep-learning accelerators in camera-based systems. The company is developing a chip called Chichibu, which targets peak AI performance of 3.6 TOPS using 8-bit integer (INT8), enough to process high-definition images at 24 frames per second, with a TDP of 1.5W. ArchiTek also plans to license the AiOnIC architecture as IP. Quadric (Burlingame, CA) shared its vision of generalizing the dataflow paradigm. Its first chip is a dataflow processor that can handle a wide range of tasks, from FFT to AI. Quadric’s architecture is based on a set of compute tiles controlled by a single instruction stream. This enables the software to employ dataflow principles to optimize the utilization of local memory and compute resources. All data movement is software controlled and deterministic; the design has no caches. Quadric’s chip, which targets edge applications, has 256 cores and consumes less than 4W. It will be sold on an M.2 module and also licensed as IP. Coherent Logix (Austin, TX) disclosed details of the HyperX HX40416, a massively parallel processor with 416 DSP cores connected by a reconfigurable fabric. This creates a dataflow architecture that – according to Linley Group analysts – fits in the ‘coarse-grain reconfigurable array’ (CGRA) category. More AI acceleration updates AI acceleration was also the subject of several presentations given by companies that also took part in previous editions of the Linley conference, and/or are already well known in this specific industry segment. These companies participated in the 2021 fall event to reiterate their key concepts or to provide updates on their product offerings. This, in different ways, applies to Edgecortix, Hailo, Flex Logix, Expedera, UntetherAI, Brainchip, Syntiant, Deep.ai. Among the updates provided, Expedera described the concept of ‘packet’, “a contiguous fragment of a neural network layer with entire context of execution”, proposing a DLA co-designed to execute packets natively; and Syntiant focused on its newest chip targeting vision applications, the NDP200, based on the company’s second-generation deep-learning accelerator, with a power consumption around 1mW. CPU, DPU, DSP, FPGA updates As usual, besides chips or IP specifically developed for deep learning acceleration, the conference offered updates on ‘traditional’ processing architectures: CPUs, DPUs, DSPs, FPGAs. Intel focused on some of the innovations announced last August at its Architecture Day – Alder Lake, Sapphire Rapids, the x86 cores optimized for performance or for efficiency – providing several details on the Thread Director. Arm discussed the Armv9 architecture and the role of DynamIQ Shared Unit-110, as well as the SystemReady initiative. Qualcomm provided updates on its Cloud AI 100 processor, including applications such as Foxconn’s Gloria AI edge box. Marvell described its Octeon 10 DPU, while Ceva recapped the features of its SensePro2 DSP. Cadence introduced HiFi 1, a new member of the Tensilica HiFi DSP IP family, targeted at audio streaming and sensor fusion in hearables and other wearable devices, offering lower power and smaller silicon area than HiFi 3. As for Risc-V-based architectures, SiFive focused on its P550 product which aims at “making high-performance Risc-V processors reality while maintaining class leading area and performance density”, and provided some details on its next generation products; Esperanto described its low voltage ET-SoC-1, which it claims is “the world’s highest performance commercial Risc-V chip”, employing custom ML vector/tensor extensions; Andes discussed its NX27V processor IP as the building block for datacenter accelerators. As for FPGAs in machine learning applications, Achronix focused on its Speedster7t family and the benefits of using a NoC, while Lattice described its product range and the SenseAI stack. NoCs, clock networks, chip telemetry, high speed SerDes The conference also saw the participation of some companies offering IP other than processor cores. Arteris described the combined offering resulting from its acquisition of Magillem, enabling automating Hardware-Software Interface (HSI) creation using NoC interconnect and IP deployment technology. Movellus reiterated the concept of intelligent clock network, providing examples of benefits achieved in different applications. ProteanTecs explained the functions of its chip telemetry solutions. Alphawave IP shared its vision of how to achieve 224Gbps connectivity data rates using PAM4 or PAM6, DFE and MLSD (Maximum Likelihood Sequence Detectors) equalization, and advanced error correction. |