Open side-bar Menu
 Guest Blogger
Sanjay Gangal
Sanjay Gangal
Sanjay Gangal is a veteran of Electronics Design industry with over 25 years experience. He has previously worked at Mentor Graphics, Meta Software and Sun Microsystems. He has been contributing to EDACafe since 1999.

EDACafe Industry Predictions for 2025 – Alif Semiconductor

 
January 9th, 2025 by Sanjay Gangal

By Reza Kazerounian and Camron Kazerounian

Reza Kazerounian

AI at the edge is too slow and power-hungry today: 32-bit microcontrollers need a major AI upgrade
Gigascale computing has enabled astonishingly fast advances in the capabilities of AI, dramatically changing the way that professionals perform functions as varied as scientific research, business report writing, and medical diagnosis.
These capabilities today depend on the computing power available in the cloud’s data centers, but embedded device manufacturers are eager for their products to take advantage of AI as well. In the embedded world, devices could use AI to respond instantly to the user’s voice commands, to predict when a machine may require servicing to avoid unplanned downtime, or to listen to and recognize the immediate sonic environment – with all AI operations performed locally, without reference to the cloud.

But if this vision is to become a reality, manufacturers of microcontrollers (MCUs) will need to re-engineer their products, tailoring them for the power, cost, and size constraints common in the embedded world.

AI: the tool for handling growing data volumes at the edge

The number of devices deployed at the edge is growing, and the amount of data that each produces is also growing fast. This is driving the need for on-device AI processing. As the global datasphere continually expands – it is forecast to quadruple to close to 600 zettabytes (ZB) by 2030 – the sheer volume of data threatens to overwhelm traditional cloud computing systems.


This means that AI and machine learning (ML) are set to play a crucial role, interpreting the data to produce meaningful insights. In fact, this will create a feedback loop, generating more data from the insights that AI software produces. The amount of data is so large that we are seeing a shift in how embedded systems process data, towards an emphasis on local processing at the edge.

Fig. 1: a smart watch can use edge AI to provide health insights on the basis of vital sign data. (Image credit: Jens Mahnke)

Because embedded devices have highly constrained resources, there are big advantages to performing AI processing locally, at the edge:

  • Lower latency: a time delay occurs when data has to be sent to the cloud for processing, and the inference sent back to the device. In many applications, this latency is unacceptable. For instance, a health monitoring device needs to respond immediately when it detects an abnormal heart rhythm, and cannot wait for the cloud to perform inferencing (see Figure 1).
  • Cost of bandwidth: there is a risk to the stability of networks when transmitting huge volumes of data to the cloud. The cost of network access can also be prohibitive. Processing AI workloads at the edge reduces the need for network bandwidth.
  • Privacy and security: privacy is a vital requirement of certain applications such as health monitoring or surveillance. By performing AI operations locally, device manufacturers can avoid exposing data to the risk of eavesdropping as it crosses the network and is stored in the cloud.
  • Power consumption: transmission and reception of data consume a large proportion of a typical embedded device’s power budget. Edge AI conserves energy, extending the time that a battery-powered device can run between charges.

By executing AI inferencing operations faster at the edge, embedded devices can enable some applications that would be impossible if reliant on cloud servers. Personal health monitoring products, for instance, that detect heart arrhythmia or that recognize changes in blood pressure and other vital signs need to perform rapid data analysis if they are to alert users to potential health risks in time for them to take effective action. Latency introduced by cloud processing could prevent the device from providing the warning that would have led to early and successful intervention. The same need for an immediate response with very low latency applies to predictive maintenance systems that use AI in industrial machinery.

In the consumer world, augmented reality (AR) smart glasses have the same need for immediate response. The user notices if the display of on-demand or context-aware information about the environment – such as identifying landmarks or translating text in real-time – is not immediate.

These examples show why real-time, secure, and low-power AI data processing needs to take place at the edge.

The types of endpoint AI applications and models

In response to the move to bring AI processing to the endpoint, AI scientists have accelerated the development of ML model optimizations. These have been made with the aim of shrinking the memory footprint of ML models while avoiding impairing the accuracy of inference: the techniques include model pruning, compression, and quantization. Moreover, developers are creating special models tailored for specific applications running at the edge. These include:

  • Image classification, segmentation and object detection: Convolutional Neural Networks (CNNs) are particularly useful for analyzing visual data. This means that they are widely deployed in applications such as face recognition, object detection, and gesture and posture recognition.
  • Generative AI and Natural Language Processing (NLP): transformers had a dramatic effect on the implementation of Large Language Models (LLMs) in generative AI systems such as ChatGPT. When adapted for lightweight Small Language Models (SLMs), transformers can run locally on edge devices. Applications for SLMs include voice commands, real-time translation, and voice-activated interfaces – with huge impact on the user’s experience of interacting with smart devices.
  • Predictive maintenance and anomaly detection: Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models are ideal for time-series data. They are particularly applicable to industrial machinery and vehicles, where they are used with sensors which monitor physical parameters such as vibration, temperature, and pressure. RNNs or LSTM models analyze the sensor data and detect anomalies, enabling operators to perform maintenance before an unexpected failure occurs (see Figure 2).

Fig. 2: edge AI powers predictive maintenance to reduce downtime and accelerate repairs of industrial equipment

The ambition of embedded device manufacturers goes further, to embrace the potential to use LLMs for natural language processing for applications such as voice interfaces: this means that the MCUs in edge devices will one day need to support this technology.

LLMs would also support ambitious features such as real-time translation of foreign language text of speech – a highly valuable application for devices such as smart glasses. Making such devices capable of recognizing voice commands in natural speech or providing context-aware interpretation of places or objects could be similarly transformative.’

The added value available to embedded device manufacturers is, then, clear to see. But how to implement these new kinds of AI at the edge without also scaling up power consumption and cost?

How new MCUs should adapt to the requirements of edge AI

The reality today is that MCUs today are failing to meet the requirements even of the ML models in use in embedded devices operating at the edge today, let alone the increased demands that are to be placed on them as ML models evolve in the future. Embedded device manufacturers are missing out on opportunities to exploit the full potential of AI because today’s MCUs and systems-on-chip (SoCs) consume too much power, are too large, and perform inferencing functions too slowly.

Some progress is already being made: pioneers including Alif Semiconductor have introduced new families of MCU that include a neural processing unit (NPU) and other features intended for use in AI inferencing. Products such as the Ensemble MCUs and the Balletto Bluetooth MCUs provide high-performance, low-power operation in voice, image, and motion analysis processing, particularly when the application deploys CNNs (see Figure 3).

But looking beyond today’s implementations of AI at the edge, there is huge untapped potential in the use of new transformer-based SLM models, as well as in newer types of neural network including Graph Neural Networks (GNNs) and Spiking Neural Networks (SNNs). The implementation of these is beyond even the best of today’s AI-optimized MCUs.

Fig. 3: the Balletto wireless MCU enables sophisticated voice and audio AI functions in wearable devices

So what is needed in a new generation of MCUs that is missing today? MCUs for AI will evolve in three particular directions.

The first is in the integration of advanced NPUs, providing a dedicated hardware resource that is optimized for AI inferencing. To exploit the potential of transformer-based models and emerging types of neural network, embedded devices will need in future to offer performance of better than 1 tera-operations per second (TOPS), while keeping power consumption so low that small battery-powered products such as earbuds can operate for at least a day before the battery needs recharging.

The second change is to the provision of memory to support AI operations in an MCU. As AI systems running models such as SLMs evolve, they will need to draw on much more memory capacity, both memory embedded on the chip, and external memory. In response, MCU manufacturers must provide interfaces to external memory that are faster and operate at lower power. New MCU architectures will also change to support new, faster pipelines for moving data within the MCU.

The third change is in integration: MCUs are able to operate as SoCs because of the wide range of functions that they integrate, but integration needs to be extended to support the operation of AI systems with more sensors, while keeping the footprint of an edge node as small as possible.
A new vision for personal embedded devices

If the MCU industry can rise to the challenge of AI at the edge, it could profit from a sea change in demand for compute power, as people’s experience of AI operations shifts from high-energy CPU- and GPU-powered hubs to personal digital devices based on an MCU.
This new era of pervasive personal computing will see AI-driven functions integrated seamlessly into devices operating with low latency at the edge, providing enhanced audio and augmented reality features in products such as smart glasses and earbuds. Technology will be experienced as an integral part of the user’s life, rather than a continual series of distractions and notifications from a smartphone or PC.

This turns upside down today’s assumptions about how people interact with technology, reducing their dependence on smartphone apps and internet services provided by a PC. Instead, more functions will be provided by a multitude of smart devices which are embedded in the environment and which communicate with each other to provide services such as health monitoring, energy management, and security and surveillance. The decentralized intelligence provided by many embedded devices at the edge will replace the centralized model in which most technology services are provided via the smartphone.

To realize this new vision, the MCU industry needs to provide a new generation of products based on architectures optimized for AI and ML functions operating with low latency and at low power.

About Author:

Reza Kazerounian is an innovative leader who has been recognized for building and managing multibillion-dollar businesses in the Automotive and Industrial sectors, as well as IoT solutions.

He is well known for his work in the areas of microcontrollers, embedded processing, and connectivity. Prior to co-founding Alif Semiconductor, Reza was the Senior Vice President and General Manager of the Microcontroller and Connectivity Business Unit at Atmel Corporation until its acquisition in 2016.

Prior to Atmel, Reza served as SVP and GM of Freescale’s Automotive, Industrial, and Multi-Market Solutions Product Groups with annual revenue exceeding $2.7B, accounting for over 65% of corporate revenue. Under Reza’s responsibility, this business consisted of the Microcontroller, Connectivity, MEMS sensors, and Analog Product divisions.

Prior to Freescale, Reza served key appointments at STMicroelectronics. He served as the CEO of STM Americas region until March 2009. Prior to this, he was the Group Vice President and GM of the Smart Card Security and Programmable Systems Memory Divisions at STM corporate based in Geneva, Switzerland.

Reza holds a B.S. degree from the University of Illinois, Chicago and a Ph.D. from the University of California, Berkeley, all in Electrical Engineering and Computer Sciences.
Camron Kazerounian

Tags: , , , , , , , , , ,

Category: EDA Predictions

Logged in as . Log out »




© 2025 Internet Business Systems, Inc.
670 Aberdeen Way, Milpitas, CA 95035
+1 (408) 882-6554 — Contact Us, or visit our other sites:
TechJobsCafe - Technical Jobs and Resumes EDACafe - Electronic Design Automation GISCafe - Geographical Information Services  MCADCafe - Mechanical Design and Engineering ShareCG - Share Computer Graphic (CG) Animation, 3D Art and 3D Models
  Privacy PolicyAdvertise