EDACafe Editorial Roberto Frazzoli
Roberto Frazzoli is a contributing editor to EDACafe. His interests as a technology journalist focus on the semiconductor ecosystem in all its aspects. Roberto started covering electronics in 1987. His weekly contribution to EDACafe started in early 2019. GPT-3; TSMC 5nm customers; ML-enhanced simulation; processor updatesAugust 24th, 2020 by Roberto Frazzoli
Catching up on some recent news after a two-week summer break, let’s start by briefly reporting about GPT-3, the new language model from OpenAI. Other updates concern EDA, processors, and more. Natural language processing with 175 billion parameters San Francisco-based OpenAI has developed GPT-3, an autoregressive language model with 175 billion parameters – ten times more than Microsoft’s Turing Natural Language Generation model. As explained in a paper, GPT-3 achieves strong performance on many NLP (natural language processing) datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation. GPT-3 can also generate samples of news articles which human evaluators can hardly distinguish from articles written by humans. As for energy usage, the researches explained that “training the GPT-3 175B consumed several thousand petaflop/s-days of compute during pre-training, compared to tens of petaflop/s-days for a 1.5B parameter GPT-2 model.” But they also added that “Though models like GPT-3 consume significant resources during training, they can be surprisingly efficient once trained: even with the full GPT-3 175B, generating 100 pages of content from a trained model can cost on the order of 0.4 kW-hr, or only a few cents in energy costs.” TSMC 5-nanometer customers According to a report quoted by Gizmochina, so far the 5-nanometer manufacturing capacity from TSMC has been mainly divided between eight major customers: Apple, Qualcomm, AMD, Nvidia, MediaTek, Intel, Bitmain, and Altera (this last one being listed in the report as a company by itself, separate from Intel). Gizmochina adds that Apple’s demand – “40,000 to 45,000 5nm process capacity in the first quarter of 2020” – has concerned its upcoming A14 and A14X Bionic chips and MacBook processors, while Qualcomm intends to use the 5nm process for its next flagship Snapdragon 875 processors, and MediaTek for the next generation of its Dimensity chips. EDA updates: Cadence, S2C, Mirabilis The Cadence Xcelium Logic Simulator has been enhanced with machine learning technology, called Xcelium ML, to increase verification throughput. Using this ML technology that directly interfaces to the simulation kernel – in combination with computational software – Xcelium ML learns iteratively over an entire simulation regression. It analyzes patterns hidden in the verification environment and guides the Xcelium randomization kernel on subsequent regression runs to achieve matching coverage with reduced simulation cycles. This way, according to Cadence, Xcelium ML enables up to 5X faster verification closure on randomized regressions. S2C and Mirabilis Design have announced their collaboration and the resulting hybrid SoC architecture exploration solution, that reuses available RTL-based blocks to accelerate model construction and speed-up very complex simulations. According to the two companies, the collaboration enables design projects that deploy model-based design methodology to further reduce the time and effort spent on creating complicated custom models of legacy designs. As part of this collaboration, Mirabilis Design’s VisualSim architecture exploration solution now integrates S2C’s FPGA-based Prodigy Logic System as a functional block. The integration allows an FPGA prototype to act as a sub-model, and to provide accurate simulation responses, in system exploration. These new solutions address the challenge of modeling custom blocks and the large increase of functional design errors related to SoC specifications in recent years. The new IBM Power CPU IBM has recently revealed the next generation of its Power CPU family, called Power10. Designed to meet the needs of enterprise hybrid cloud computing, the new generation promises an improvement of up to 3x greater processor energy efficiency, workload capacity, and container density than the Power9 processor. Among the innovations introduced by Power10 are the use of a 7-nanometer process; support for multi-petabyte memory clusters with a new technology called Memory Inception, that allows any of the Power10 processor-based systems in a cluster to access and share each other’s memory; new hardware-enabled security capabilities, including transparent memory encryption and a higher (4x) number of AES encryption engines per core compared to Power9; new processor core architectures with an embedded Matrix Math Accelerator which is extrapolated to provide 10x, 15x and 20x faster AI inference for FP32, BFloat16 and INT8 calculations per socket respectively, compared to the Power9 processor. Big claims from Tachyum August was an important month for Tachyum, a Silicon Valley-based semiconductor startup with R&D development center in Bratislava, Slovakia. On August 4th the company announced that its Prodigy Universal Processor – aimed at large datacenters, with production scheduled for 2021 – has successfully completed software emulation testing across x86, ARM and RISC-V binary environments, demonstrating its ability to run legacy applications transparently. And on August 11th Tachyum announced that it had successfully completed a demonstration showing its Prodigy Universal Processor running faster than any other processor, HPC or AI chips, including ones from Nvidia and Intel. The demonstration was based on a Verilog simulation of the Prodigy post layout hardware. According to Tachyum, Prodigy outperforms the fastest Xeon processors at 10x lower power on data center workloads, as well as outperforming Nvidia’s fastest GPU on HPC, AI training and inference. Tachyum also claims that Prodigy is “the world’s first and only universal processor”, capable of replacing different types of processors – GPUs, CPUs, TPUs and other accelerators – currently used in datacenters to run different workloads. Image capture at 12.5Gbps on a single coax cable According to Microchip, “Until the 12.5Gbps CoaXPress 2.0 interface standard was ratified last year, machine-vision image-capture solutions had replaced conveyor belts as the primary roadblock to achieving faster production-line throughput.” Supporting the new technology – which uses regular 75 ohm coax cable – Microchip has developed a single-chip physical-layer interface devices implementing the CXP 2.0 standard, with an integrated clock data recovery at all speed levels and a camera-side clock. Microchip expects that its CoaXPress 2.0 family will have an equally transformational effect on other vision applications, including traffic monitoring, surveillance and security, medical inspection systems and embedded vision solutions. Biometric payment cards with integrated fingerprint sensor Infineon and Sweden-based Fingerprint Cards have jointly developed a solution for biometric payment cards with an integrated fingerprint sensor. With this solution, the contactless card remains in the hands of the cardholder throughout the entire payment transaction, while eliminating the need for PIN entries or signatures to authorize even high-value payments. |