The News: Today, NVIDIA held its annual GTC event where CEO Jensen Huang delivered the keynote outlining the company’s new technologies that will shape the business’s future. Register and capture the live and on demand sessions here.
Analyst Take: Today’s keynote served as an important inflection point for NVIDIA and for the AI discussion. While the overall discussion looked widely at NVIDIA’s portfolio, what caught my attention most acutely was the advancements that the company is making in enterprise AI. So I am going to focus on three key announcements in that area.
NVIDIA Advances Recommendation Engines With Merlin
While some may not realize it, the recommender engine is one of the most important AI models and represents the largest data pipeline in the world. The ability to process, learn and infer what consumers and users are looking for requires sophistication that from a cold start can take user data, filter that data, compare it to similar users and deliver useful recommendations. We experience this with Amazon, Netflix and now hundreds of other commerce platforms that we engage with online.
These systems need to be able to look at billions of users, queries and preferences along with trillions of objects concurrently, but in the future this will be the key to a good internet experience. It will essentially be the only way people can connect to useful information. Improvements of recommender systems of even 1% have an exponential value to business.
NVIDIA launched Merlin at GTC 2020 to enable more rapid deployment of its recommender framework. With as little as 50 lines of code, enterprise will be able to write state of the art recommender models. This will be invaluable for CSP acceleration in the datacenter enabling business to iterate faster and/or to create recommender systems.
NVIDIA Goes Multimodal in Conversational AI with Jarvis
The next most important workload for AI after recommender engines is going to be the next wave of conversational AI. We have seen chatbots and more simple usage of AI to engage interactions between humans and machines, but the next frontier will be greater interactivity and more human-like engagement.
At GTC 2020, NVIDIA announced Jarvis. Recognizing that the challenge isn’t models anymore, but rather the latency, Jarvis can put together the entire end to end pipeline for conversational models and optimize them with Tensor on NVIDIA GPUs. This means dozens or even hundreds of models can be optimized for a more seamless interaction taking latency from as long as 25 seconds down to just 3ms.
As we advance to more human interactions, Jarvis was also designed to work in 3D. This means it can use sensor technology to make eye contact and read lips with its face animation model. The model can mimic and learn facial responses by watching humans.
All of this can be deployed with a single line of code and work with various pre-trained models that can work for an existing domain. This has powerful implications for collaboration, call center, smart speakers, retail assistance and in-vehicle infotainment/assistants.
Ampere Delivers for Scale Out, Scale Up and Delivers Strong Economics
The third major enterprise AI focus from GTC 2020 came in the datacenter with the announcement of the Ampere A100. What Huang called “the largest chip,” housing 54 billion transistors and 3rd generation tensor cores with the ability to sustain data rates over 1TB/Sec. Fast and Powerful.
This powerhouse would work as a multi-instance GPU where a single machine could legitimately be shared out 56 ways and according to NVIDIA it represents a 10x performance improvement over its previous generation.
Essentially this technology is a significant upgrade over previous generation and also the A100 has the capacity to serve almost all datacenter acceleration workload needs including Analytics, Training, Inference, Graphics, Rendering, and Video Processing for scale up and scale out–All in a single GPU.
For me, what caught my attention more than just the power of the A100 platform was the economics presented. Today’s datacenter for AI would house:
600 CPU for Inference
630 kW of power consumption
and each rack would cost around $11 Million.
With the A100 the new economics looks like this.
5 DGX A100
A single Rack
28 KW of power consumption
and this single rack would cost around $1 Million.
See what happened there? More than 90% improvement in economics and that is just the cost alone. Look at the reduction in power consumption and then consider that this one system is more powerful than the aforementioned 25 rack system.
Overall Impressions of NVIDIA GTC 2020 Announcements
NVIDIA’s announcements at GTC 2020, particularly those related to enterprise AI, showed the company’s strong leadership and ambitions to continue to drive the AI narrative forward. This is why almost every major cloud including AWS, Azure and Google utilize NVIDIA GPUs in their AI/ML architecture.
With Merlin and Jarvis, the company is propelling forward critical AI workloads for conversation and recommendation engines–These technologies are pivotal and that importance is being amplified with more people than ever before engaging in e-commerce.
From the datacenter standpoint, Ampere is a disruptive force for datacenter scale acceleration, bringing analytics, training, inference, graphics, rendering and video processing for scale up and scale out into a single GPU.
It will be important to watch volume metrics for revenue as NVIDIA is certainly betting big on a volume of datacenter investment to offset the economics of the new, more cost-efficient hardware.
Having said all of that, the company is incredibly well positioned with arguably some of the most advanced hardware, software and frameworks to accelerate AI adoption in the cloud, at the edge and for everyday consumers looking to ubiquitously interact with AI as part of a technology experience including shopping, gaming and collaborating.
Futurum Research provides industry research and analysis. These columns are for educational purposes only and should not be considered in any way investment advice.
Read more analysis from Futurum Research:
Daniel Newman is the Principal Analyst of Futurum Research and the CEO of Broadsuite Media Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise. Read Full Bio