Gary Smith EDA Research: Sub-Optimal Processing CPU vs. GPU / NPU
[ Back ]   [ More News ]   [ Home ]
Gary Smith EDA Research: Sub-Optimal Processing CPU vs. GPU / NPU

A general purpose processor (CPU) is by definition sub-optimal for any specific application. Its strength is in the ability to do multiple applications. That worked fine until we began to run into the power wall a few years ago. We now can no longer afford the luxury of a sub-optimal solution. Jem Davies of ARM and I have been discussing this recently and he wrote a great blog that you’ll find interesting CPUs Have Been Doing GPU Computing Badly for Years.

Multi-Core vs. Many-Core

CPUs also don’t do well in many-core solutions. As you don’t know which application you will be running you must assume that the application contains more than ten percent serial code and therefore you are stuck with Amdahl’s Law. That limits you to four processors; or four sets of four. Basically multi-core, rather than many-core, processing. There have been attempts to reach thirty-two cores but no successes yet. This is a long way from the proposed world of many-core processing.


If you look at it from the SMP (Symmetric Multi-Processing) and AMP (Asymmetric Multi-Processing) view, CPUs run into similar problems; or the same problems looked at from a different direction. SMP basically means you are using the same processor, where AMP means you are using different processors.

Once you look at closely coupled SMP architecture you can only enter the world of many-core processing in specific “embarrassingly parallel” applications. I’m sure some engineer out there has, or will, use a CPU for his many-core embarrassingly parallel architecture; however it would be a sub-optimal solution. Once you find an embarrassingly parallel problem, of significant market size, you might as well optimize your processor. Network Processors (NPUs) and Graphics Processors (GPUs) are present examples.

AMP processing, on the other hand, gives you the opportunity to use optimized processors for each application. That way the use of the CPU can be minimized.

The Future of Many-Core Processing

Now that we are in the era of billions of gates SoCs you need to look at computer architecture differently than we have in the past. Consider the availability of multiple application specific processors augmented by hard wired accelerators. These processors can be deployed as a multi-core or many-core implementation depending on the application. There may be four, five or six separate banks of different applications processors depending on the system design. They probably will be augmented with four or sixteen CPUs to extend the SoCs capabilities.

In Conclusion

CPUs have served us well in the past and will always be a big part of our design resources. Unfortunately we can no longer afford their sub-optimal performance in an increasingly higher percentage of our designs. The development of a highly optimized set of application specific processors is vital to keeping the power problem under control.

To view entire paper, download the PDF here