EDACafe: EDACafe Editorial - A closer look at Intel’s oneAPI

EDACafe Editorial

Roberto Frazzoli
Roberto Frazzoli is a contributing editor to EDACafe. His interests as a technology journalist focus on the semiconductor ecosystem in all its aspects. Roberto started covering electronics in 1987. His weekly contribution to EDACafe started in early 2019.

A closer look at Intel’s oneAPI

January 9th, 2020 by Roberto Frazzoli

Last November Intel introduced oneAPI, described as “a single, unified programming model that aims to simplify development across multiple architectures – such as CPUs, GPUs, FPGAs and accelerators”. This week we take a closer look at oneAPI with the help of Herb Hinstorff, Director of Marketing, Software Developer Products Division at Intel.

Image credit: Intel

Before proceeding to our Q&A session, let’s briefly summarize what oneAPI is about. As described in the Intel’s fact sheet, there are two components of oneAPI: an industry initiative and the Intel beta product. The oneAPI initiative cross-architecture development model is based on industry standards and an open specification, to enable broad ecosystem adoption. The Intel oneAPI beta product is Intel’s implementation of oneAPI that contains the oneAPI specification components with direct programming (Data Parallel C++), API-based programming with a set of performance libraries, advanced analysis and debug tools, and other components. Developers can test their code and workloads in the Intel DevCloud for oneAPI on multiple types of Intel architectures today, including Intel Xeon Scalable processors, Intel Core processors with integrated graphics and Intel FPGAs (Arria, Stratix). The development flow devised for FPGAs – a target traditionally requiring Verilog or VHDL skills, as reminded by Kevin Morris’ article on EEJournal – is described in the following diagram.

Image credit: Intel

And now, our Q&A session with Herb Hinstorff.

EDACafe: Except for the recent addition of new AI accelerator chips from startups, the current diversity of silicon processing platforms is not different from what it used to be years ago: CPUs, GPUs and FPGAs. Would it be correct to say that AI is the major factor behind the need for a unified programming model?

Hinstorff: It’s really a combination of factors that crystallized this developer need. In addition to the array of dedicated AI chips, the use of FPGAs as compute accelerators has grown, and more vendors (including Intel) are now fielding GPU compute accelerators. Without a shared development flow, developers are looking at worst case scenario of a total of X different vendors fielding Y different architectures with a separate development environment required for each. This limits the opportunity for code reuse across vendors and architectures and requires developers to invest in learning many different toolsets.

EDACafe: Why did Intel choose to use DPC++ instead of a pre-existing framework such as OpenCL, for example?

Hinstorff: DPC++ is largely based on SYCL which, like OpenCL, is being developed under the Khronos Group organization. SYCL benefits from ten years of industry learnings from OpenCL which has seen mixed success in the market. One of the advantages of SYCL over OpenCL is that SYCL is entirely based on C++, so it leverages developers’ experience with that popular high-performance language. Also, SYCL works at a higher level of abstraction so that less of the code is tied closely to the target architecture. This increases the opportunity for code reuse.

EDACafe: Can oneAPI be described as an additional level of abstraction? If so, should programmers worry about oneAPI adding overheads? Can you comment on the following note from the oneAPI Programming Guide? “Not all programs can benefit from the single programming model offered by oneAPI”. How can a programmer assess the benefits for his specific program?

Hinstorff: It is a level of abstraction, but the goal is to enable the native performance offered by the underlying hardware. This is no different than how C++ is a level of abstraction above hardware, but with efficient programming and an efficient compiler, native performance is achievable. The note in the programming guide you referenced is mainly about how not all programs benefit from using accelerators. Since oneAPI is primarily focused on accelerator programming, programs that are bottlenecked by random access across a large memory region such as a database, for example, may not benefit.

EDACafe: One of the assumptions behind oneAPI is that “no single architecture is best for every workload”. Does oneAPI provide a way to compare the performance achievable on different hardware targets for a given program?

Hinstorff: In general, it’s still a developer task to compare implementations across different architectures. We are building a capability known as Offload Advisor that will help developers run what-if experiments to assess performance of their code on GPU architectures before investing time to complete an experimental implementation. In the future, we’d like to extend this capability to other architectures but we’re not there yet.

EDACafe: Startups keep on introducing new AI accelerator chips based on very diverse architectures (ranging from “tensor native” to “graph native”, for example). Will it be possible to use oneAPI to target these chips, too?

Hinstorff: Yes, we’re working with AI architecture experts to evolve the DPC++ language, the AI libraries, and the Level Zero low-level accelerator API portions of the oneAPI specification to address the requirements for a range of dedicated AI architectures. One of the big reasons why we’ve released draft specifications and early beta tool implementations at this time is to enable that sort of collaboration.

EDACafe: The DPC++ toolkits include a migration engine that transforms CUDA applications into a standards-based DPC++ code. Will migration of CUDA code to DPC++ result in the possibility to move a given application from Nvidia GPUs to Intel GPUs?

Hinstorff: We are providing a DPC++ Compatibility Tool that will assist developers with migrating their CUDA code to DPC++. It won’t port 100% of the code but we believe it will help automate the majority of the drudgery for developer. The developers then apply their expertise to complete the DPC++ code and tune it for whichever Intel accelerator platform they choose.

Image credit: Intel

EDACafe: From the point of view of gaining a competitive advantage, does Intel attach more importance to the unified programming environment or to the libraries?

Hinstorff: The oneAPI libraries and DPC++ compiler together comprise the unified development environment. We encourage the use of libraries wherever possible because the included functions are pre-optimized for the target architecture and don’t require tuning like DPC++ code does. This is true today for CPU libraries as well, but the developer benefit is multiplied when the libraries are pre-optimized for multiple architectures.

EDACafe: Do you see any similarities between Intel’s oneAPI and Vitis, Xilinx’s unified software platform?

Hinstorff: We believe the purpose of oneAPI is quite different than Xilinx Vitis. oneAPI’s primary goal is to deliver a single cross-architecture programming model. Vitis is entirely focused on programming Xilinx FPGAs. Perhaps the biggest similarity is that both use high-level synthesis rather than the traditional Verilog and VHDL register transfer level languages.

This concludes our Q&A session with Herb Hinstorff, Director of Marketing, Software Developer Products Division at Intel. EDACafe will continue to inform readers about oneAPI and other developments in the area of unified programming models.

Image credit: Intel

The oneAPI logo. Image credit: oneAPI Initiative

This entry was posted on Thursday, January 9th, 2020 at 9:59 am. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

A closer look at Intel’s oneAPI

Back to 'EDACafe Blogs'

EDACafe Editorial

Subscribe to Blog via Email

Recent Posts

Categories

Meta