Swense Tech

Best Solution For You

Nvidia Rides The Generative AI Wave At GTC

This year’s NVIDIA GPU Technology Conference (GTC) could not have come at a more auspicious time for the company. The hottest topic in technology today is the Artificial Intelligence (AI) behind ChatGPT, other related Large Language Models (LLMs), and their applications for generative AI applications. Underlying all this new AI technology are NVIDIA GPUs. NVIDIA’s CEO Jensen Huang doubled down on support for LLMs and the future of generative AI based on it. He’s calling it “the iPhone moment for AI.” Using LLMs, AI computers can learn the languages of people, programs, images, or chemistry. Using the large knowledge base and based on a query, they can create new, unique works: this is generative AI.

Jumbo sized LLM’s are taking this capability to new levels, specifically the latest GPT 4.0, which was introduced just prior to GTC. Training these complex models takes thousands of GPUs, and then applying these models to specific problems require more GPUs as well for inference. Nvidia’s latest Hopper GPU, the H100, is known for training, but the GPU can also be divided into multiple instances (up to 7), which Nvidia calls MIG (Multi-Instance GPU), to allow multiple inference models to be run on the GPU. It’s in this inference mode that the GPU transforms queries into new outputs, using trained LLMs.

Nvidia is using its leadership position to build new business opportunities by being a full-stack supplier of AI, including chips, software, accelerator cards, systems, and even services. The company is opening up its services business in areas such as biology, for example. The company’s pricing might be based on use time, or it could be based on the value of the end product built with its services.

The Promise of LLMs

Using these large training sets allows AI computers to better understand normal language. For example, a trained AI computer could take a request to write a program and then use its knowledge of programming constructs to build the requested program. But there’s still a significant gap between AI’s best efforts and properly constructed, verified, and documented programs. That’s today. But even now, this approach is democratizing programming by making better “no-code” applications and making it easier for people to build their own applications using human language without the need for a deep understanding of programming constructs.

However, there are limits to the AI’s knowledge of ground truths. AI models can only work from the data used to train them. It’s not yet possible to stream real-time information from the world and retrain on this steady stream of new information on the fly.

Scaling LLMs to new levels can give them new power– the bigger “brain” gives them more capabilities. But there’s still going to be a need to incorporate ethical structures, avoid biases, and add guardrails – a lot more work in AI ethics needs to be done.

Nvidia views training as the production of intelligence – creating an “AI factory.” Inference takes that knowledge and makes it actionable and deployable, and it is scalable from cloud to end devices. Nvidia’s service model allows companies to take and customize the pre-trained foundational models that Nvidia has created. Companies can add guardrails to the inference results and incorporate internal or proprietary data. Training is in the cloud, and Nvidia offers its DGX computer cloud services directly or through hyperscaler clouds. This training-as-a-service is tailored to enterprises that don’t have these capabilities in-house.

LLM Deployments

Today’s capabilities are already shaking things up. Startups are jumping on the bandwagon. Not to be left out, the key hyperscalers are all developing strategies around LLMs. Microsoft built the datacenter for OpenAI, which developed ChatGPT using the A100 GPU, the predecessor to the H100. Microsoft is incorporating AI technology into Bing for search and chat and in Microsoft 365 as its Copilot assistant to work tasks. All of these services run on Nvidia GPUs. Microsoft Azure cloud services are already using the new Nvidia H100 GPUs in private preview.

NVIDIA lined up multiple cloud vendors for its H100 cloud in addition to Microsoft’s Azure. The Oracle Cloud Infrastructure is limited availability, and it’s generally available from Cirrascale and CoreWeave. Nvidia said that AWS’s H100 cloud will be available in the coming weeks in limited preview. In addition, Google Cloud, along with NVIDIA’s cloud partners Lambda, Paperspace and Vultr, plan to offer H100 cloud services.

Nvidia is offering three new LLM modality/services:

1. For human language there’s NeMo.

2.For images (including video) there is Picasso.

3. And for biology, the language of proteins, the service is BioNeMo.

BioNeMo can be used to teach AI the language of proteins for biology and drug research. Nvidia’s Huang expects this will lead the way to pervasive use of AI in medicine and drug discovery. AI has many applications for data filtering, drug trial projections, and other medicine discovery. The use cases can predict 3D protein structures, molecular docking, etc. With the BioNeMo service, companies can build custom models based on proprietary data. Using the tool, it’s possible to reduce model training times from 6 months of computation to only about 4 weeks. Amgen is an early Nvidia partner. Nvidia is also partnering with Medtronic on intelligent medical devices. AI will prove invaluable automating routine tasks or reduce complexity, which appeals to medical professionals.

Building a Digital Twin World

Another key Nvidia announcement was its Omniverse platform. This digitalization of the real world has many applications. In particular, it seems that the automotive and other industrial manufacturing industries are embracing these digital representations of real world machines (so called digital twins).

Highlighting auto manufacturers in the keynote, Huang noted that Omniverse is being used from factories to customer experience. Omniverse accelerates entire workflows; from design and manufacturing, to customer engagement. GM uses a digital twin of automotive designs to help it model aerodynamics. Toyota, Mercedes, and BMW use Omniverse for designing factories. LUCID uses a 3D VR car to entice customers. At GTC it was also announced that Microsoft Azure will offer Omniverse services.

Nvidia also debuted its Isaac Sim platform designed to enable global teams to remotely collaborate to build, train, simulate, validate and deploy robots.

New Datacenter Hardware

To support advanced generative AI, Nvidia also debuted four inference platforms. This includes Nvidia L4 for producing AI video; Nvidia L40 for 2D/3D image generation; Nvidia H100 NVL for deploying large language models. Nvidia also announced that Grace Hopper “superchip” — which connects the Arm-based Grace CPU and Hopper GPU over a high-speed coherent interface — is now sampling. Grace Hopper targets recommendation systems that work from very large datasets.

The low-profile L4 PCIe card is capable of up to 120x more AI-powered video performance than CPUs and is also far more energy efficiency. Google Cloud, announced G2 virtual machines available in private preview, is the first cloud services provider to offer NVIDIA’s L4 Tensor Core GPU

The L40 PCIe card serves as the engine of Omniverse. The company states it offers 7x the inference performance for Stable Diffusion and 12x Omniverse performance over the previous generation Nvidia product. The L4 and L40 products are based on the Ada Lovelace GPU architecture.

The H100, based on the NVIDIA Hopper GPU computing architecture has a built-in Transformer Engine. It was optimized for developing, training and deploying generative AI, large language models (LLMs) and recommender systems. Part of the performance improvements of the H100 over the A100 is the use of the H100’s FP8 precision math that can speed up AI training by 9x and up to 30x faster AI inference on LLMs versus the prior-generation A100.

NVIDIA H100 NVL for deploying massive LLMs like ChatGPT at scale. The new H100 NVL has 94GB of memory with Transformer Engine acceleration delivers up to 12x faster inference performance at GPT-3 compared to the prior generation A100 at data center scale. The H100 NVL PCIe card has two H100 GPUs connected over the NVLink coherent bus. The H100 NVL GPU is expected in the second half of the year.

Using GPUs to Build Better Chips

Another application of Nvidia’s GPU technology is computational lithography. In one application, chipmaking, the GPU compensates for optical diffraction (blur) when generating optical masks used to project photolithographic layers onto the silicon. Nvidia created the CuLitho service that allows optical and EUV mask makers to speed the calculation by about 40 times traditional CPU computations using the new Hopper H100 GPU, which accelerates mask making and, consequently, the entire production process of designing new chips. One of the first partners is Nvidia’s long-time foundry partner TSMC. Also partnering are Synopsys and ASML.


Nvidia is riding the wave of advanced AI and digital twins. Its first mover advantage has not let up from the very earliest days of accelerated AI processing. The company is expanding its business model to cloud services, even if it’s through other companys’ clouds. Huang described it as a new model where Nvidia is a cloud within the cloud.

The key to Nvidia’s success is that it keeps moving its business further up the value chain, building on many years of software development and the industry’s most mature AI platform and widest ecosystem. Somehow, the company always seems to be at the right place, at the right time.

Tirias Research tracks and consults for companies throughout the electronics ecosystem from semiconductors to systems and sensors to the cloud. Members of the Tirias Research team have consulted for AMD, Arm, Intel, Nvidia, Qualcomm, SiFive and other companies throughout the CPU and GPU IP ecosystems.