Catching up on some recent news after a two-week summer break, let’s start by briefly reporting about GPT-3, the new language model from OpenAI. Other updates concern EDA, processors, and more.
Natural language processing with 175 billion parameters
San Francisco-based OpenAI has developed GPT-3, an autoregressive language model with 175 billion parameters – ten times more than Microsoft’s Turing Natural Language Generation model. As explained in a paper, GPT-3 achieves strong performance on many NLP (natural language processing) datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation. GPT-3 can also generate samples of news articles which human evaluators can hardly distinguish from articles written by humans. As for energy usage, the researches explained that “training the GPT-3 175B consumed several thousand petaflop/s-days of compute during pre-training, compared to tens of petaflop/s-days for a 1.5B parameter GPT-2 model.” But they also added that “Though models like GPT-3 consume significant resources during training, they can be surprisingly efficient once trained: even with the full GPT-3 175B, generating 100 pages of content from a trained model can cost on the order of 0.4 kW-hr, or only a few cents in energy costs.”
TSMC 5-nanometer customers
According to a report quoted by Gizmochina, so far the 5-nanometer manufacturing capacity from TSMC has been mainly divided between eight major customers: Apple, Qualcomm, AMD, Nvidia, MediaTek, Intel, Bitmain, and Altera (this last one being listed in the report as a company by itself, separate from Intel). Gizmochina adds that Apple’s demand – “40,000 to 45,000 5nm process capacity in the first quarter of 2020” – has concerned its upcoming A14 and A14X Bionic chips and MacBook processors, while Qualcomm intends to use the 5nm process for its next flagship Snapdragon 875 processors, and MediaTek for the next generation of its Dimensity chips.
(more…)