Habana launches new chips for training and inference, claiming 2X performance advantage over last generation NVIDIA A100
Habana launched their 1st chip at the inaugural Kisaco AI Hardware Summit in 2018 with excellent performance and efficiency. Unfortunately for Habana, there was not much interest in a market that didn’t yet exist: AI accelerators for running, not training, AI neural net models. Nonetheless, Intel acquired Habana in 2019 for $2B shortly after Habana launched the Gaudi training chip, which had enough performance to earn a coveted design win at Amazon AWS.
Intel has fervently left the Habana team alone since the acquisition, buffered from the meddling but helpful Intel design teams and media attempts to gauge their progress. Their efforts have finally paid off nicely, with a new Goya and Gaudi generation being launched today at the Intel VisiON event. To sum it up, the new Habana chip will appeal to those looking for an alternative to NVIDIA and willing to make the changes to their software to run AI models well. But NVIDIA has already launched the new generation Hopper GPU with dedicated accelerators for Transformers (for NLP) which should keep NVIDIA in the performance lead at least in this high-growth AI market segment.
What did Habana Launch?
In what was a surprise to me, Habana launched new chips for inference and training . I had speculated that the Goya inference line would be shelved, given limited traction and the AI enhancements Intel has made to the new Sapphire Rapids Xeon CPU, but the Habana team obviously decided that the market was ripe for a second run called Greco, which will sample later this year. While Intel claims over 80% share in inference processing with Xeon CPUs, complex inference processing such as natural language workloads greatly benefit from dedicated accelerators in data centers and on the edge.
Notably, the Gaudi training platform now supports 96GB of high bandwidth HBME memory, which undoubtably improved performance albeit at higher costs. (NVIDIA A100 and H100 support 80GB.) Also, the new Habana chips are being manufactured on 7nm TSMC fabs, and Gaudi includes 24 100GbE ethernet links to enable massive scale-out deployments using standard networking, one of the key benefits we highlighted when Gaudi was launched. The multi-die platform consumes 600 watts TDP. Gaudi2 is sampling now to select customers and will become generally available later this year, presumably on the AWS cloud as well as from server OEMs. Pricing, of course, was not disclosed.
Habana has also made significant progress improving the software development environment. The addition of a customer kernel library in the stack should help Intel gain more developer contributions to optimizing AI models, the lack of which impacted Intel’s ability to compete more effectively on MLPerf benchmarks. Also, Habana now has full support for Pytorch and Tensorflow and they now offer the training and tools needed to deploy and optimize AWS Gaudi instances.
Obviously Intel is in it to win it. The investments in Habana were wise, and leaving Habana to their own devices has surely insulated the company from others with “good ideas”. The architecture is appealing, both from both the performance and scalability standpoint. I will reserve judgement on how well both platforms perform against NVIDIA Hopper, but am skeptical Habana will fare well against the new Transformer Engine in Hopper.
More importantly perhaps is the progress Habana continues to make in the software realm; users want choice, and Habana’s new chips certainly give them an option they will consider if they can easily try it out on the AWS cloud. I still recommend that Habana work closely with AWS to create a very low cost or free access route for qualified developers. To paraphrase a former candidate for the US Presidency, “Its the software, stupid.”