Intel Gaudi2 AI Accelerator Breaks Barriers with 2x Performance

The Dominion of Design

Sanjay Gangal
Sanjay Gangal is a veteran of Electronics Design industry with over 25 years experience. He has previously worked at Mentor Graphics, Meta Software and Sun Microsystems. He has been contributing to EDACafe since 1999.

Intel Gaudi2 AI Accelerator Breaks Barriers with 2x Performance Surge on GPT-3 Using FP8 Software

November 10th, 2023 by Sanjay Gangal

In the ever-evolving world of artificial intelligence, performance and efficiency are paramount. The ability to train and deploy AI models quickly and cost-effectively has become a competitive advantage for organizations across various industries. Intel, a pioneer in the field of semiconductor technology, continues to push the boundaries of AI performance with its Intel Gaudi2 accelerator and 4th Gen Intel Xeon Scalable processors. In a recent development, Intel has achieved a remarkable 2x performance leap on the GPT-3 benchmark by implementing FP8 software. This achievement, validated through the industry-standard MLPerf training v3.1 benchmark, underscores Intel’s commitment to providing competitive AI solutions that can be deployed anywhere.

The Milestone Announcement

On November 8, 2023, Intel announced the groundbreaking results of its MLPerf training v3.1 benchmark for training AI models. These results encompassed Intel’s Gaudi2 accelerators and 4th Gen Intel Xeon Scalable processors equipped with Intel Advanced Matrix Extensions (Intel AMX). The standout performance came from Intel Gaudi2, which demonstrated an impressive 2x performance improvement thanks to the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark. This accomplishment reaffirms Intel’s dedication to making AI accessible and efficient for a wide range of applications.

Sandra Rivera, Intel’s Executive Vice President and General Manager of the Data Center and AI Group, highlighted the significance of this achievement, stating, “We continue to innovate with our AI portfolio and raise the bar with our MLPerf performance results in consecutive MLCommons AI benchmarks. Intel Gaudi and 4th Gen Xeon processors deliver a significant price-performance benefit for customers and are ready to deploy today. Our breadth of AI hardware and software configuration offers customers comprehensive solutions and choice tailored for their AI workloads.”

Credit – Intel Corporation

Why It Matters

The latest MLCommons MLPerf results build upon Intel’s strong track record in AI performance, surpassing the benchmarks set in June. Notably, the Intel Xeon processor remains the only CPU to report MLPerf results, showcasing Intel’s leadership in this domain. Moreover, Intel Gaudi2 is among the top three accelerator solutions used for these benchmark results, with only two of them being commercially available. This achievement reinforces Intel’s position as a key player in the AI hardware and software ecosystem.

Intel Gaudi2 and 4th Gen Xeon processors have demonstrated exceptional AI training performance across various hardware configurations. This versatility allows them to address the diverse computational needs of customers, catering to the growing demands of AI workloads.

Intel Gaudi2 Results

Intel Gaudi2 has emerged as a formidable alternative to NVIDIA’s H100 for AI computing requirements. The MLPerf results for Gaudi2 showcase its remarkable training performance:

2x Performance Leap with FP8: By implementing the FP8 data type on the v3.1 training GPT-3 benchmark, Intel Gaudi2 reduced the time-to-train by more than half compared to the June MLPerf benchmark. The training was completed in an impressive 153.58 minutes, utilizing 384 Intel Gaudi2 accelerators. The Gaudi2 accelerator supports FP8 in both E5M2 and E4M3 formats, with the option of delayed scaling when necessary.
Efficient Training with BF16: Intel Gaudi2 demonstrated its capabilities by completing training on the Stable Diffusion multi-modal model with 64 accelerators in just 20.2 minutes, using the BF16 data type. This highlights its efficiency in handling various AI workloads.
Expanding FP8 Support: While FP8 was initially used only in GPT-3 for this MLPerf training submission and in GPT-J for the previous inference submission, Intel is actively expanding FP8 support in Gaudi2 Software and tools to include additional models for both training and inference.
Competitive Performance: In addition to GPT-3, Intel Gaudi2 demonstrated excellent benchmark results for other AI models as well. Notably, BERT and ResNet-50 achieved training times of 13.27 and 15.92 minutes, respectively, using the BF16 data type on eight Intel Gaudi2 accelerators.

4th Gen Xeon Results

Intel’s 4th Gen Xeon processors, known for their reliability and versatility, continue to impress with their MLPerf results:

Diverse Model Support: Intel submitted results for RESNet50, RetinaNet, BERT, and DLRM dcnv2. These results highlight the processor’s strong performance across a range of AI models.
Continued Excellence: The results for ResNet50, RetinaNet, and BERT were consistent with the strong out-of-the-box performance demonstrated in the June 2023 MLPerf benchmark. This reaffirms the reliability and consistency of Intel’s CPU-based AI solutions.
New Model Performance: DLRM dcnv2, a new addition since the previous submission, showcased impressive results. The CPU demonstrated a time-to-train submission of 227 minutes using only four nodes. This highlights the scalability and efficiency of Intel’s CPU offerings.

Implications and Future Prospects

The performance achievements of Intel Gaudi2 and 4th Gen Xeon processors in the MLPerf training v3.1 benchmark are a testament to Intel’s dedication to advancing AI capabilities. With AI becoming increasingly integral to a wide range of industries, including healthcare, finance, and automotive, the ability to train AI models quickly and efficiently is crucial.

Intel’s commitment to continuous improvement is evident, as they anticipate more advancements in AI performance results through software updates and optimizations. These developments will be reflected in future MLPerf benchmarks, providing customers with even more choices for AI solutions that can meet dynamic requirements for performance, efficiency, and usability.

Conclusion

Intel’s recent achievement of a 2x performance leap on the GPT-3 benchmark with FP8 software is a significant milestone in the field of artificial intelligence. This accomplishment reaffirms Intel’s position as a leader in AI hardware and software solutions, offering customers the performance and versatility they need to excel in AI-driven applications. As AI continues to shape the future of technology, Intel remains at the forefront, providing innovative solutions that empower organizations to harness the full potential of AI.

Category: Intel

This entry was posted on Friday, November 10th, 2023 at 8:25 am. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Intel Gaudi2 AI Accelerator Breaks Barriers with 2x Performance Surge on GPT-3 Using FP8 Software

The Milestone Announcement

Why It Matters

Intel Gaudi2 Results

4th Gen Xeon Results

Implications and Future Prospects

Conclusion

Related

Back to 'EDACafe Blogs'

The Dominion of Design

Recent Posts

Categories

Meta

Intel Gaudi2 AI Accelerator Breaks Barriers with 2x Performance Surge on GPT-3 Using FP8 Software

The Milestone Announcement

Why It Matters

Intel Gaudi2 Results

4th Gen Xeon Results

Implications and Future Prospects

Conclusion

Share this:

Related

Back to 'EDACafe Blogs'

The Dominion of Design

Recent Posts

Categories

Meta