Search

AWS Launches Inf2 Instances for High-Performance Generative AI

The News: Amazon Web Services (AWS) is announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances, which deliver high performance at the lowest cost for generative AI models including large language models (LLMs) and vision transformers. See the full announcement from Amazon here.

AWS Launches Inf2 Instances for High-Performance Generative AI

Analyst Take: Generative artificial intelligence is a rapidly evolving field, with the pace of innovation seemingly reaching new heights every day. It has already enabled applications such as text summarization, code generation, video and image generation, speech recognition, and personalization. However, until now running inference on large and complex deep learning models such as large language models (LLMs) and vision transformers requires high performance, low latency, and cost efficiency.

Amazon EC2 has announced the general availability of Amazon EC2 Inf2 instances, which are powered by AWS Inferentia2, the latest AWS-designed deep learning accelerator. Inf2 instances are designed to deliver high performance at the lowest cost for generative AI inference.

What Are Inf2 Instances?

Inf2 instances are inference-optimized instances that support scale-out distributed inference with ultra-high-speed connectivity between accelerators. They are powered by up to 12 AWS Inferentia2 chips, each with two second-generation NeuronCores that offer up to 190 tera floating operations per second (TFLOPS) of FP16 performance. Inf2 instances offer up to 2.3 petaflops of deep learning performance and up to 384 GB of total accelerator memory with 9.8 TB/s bandwidth.

Inf2 instances are the first inference-optimized instances in Amazon EC2 to introduce NeuronLink, a high-speed nonblocking interconnect that enables efficient deployment of models with hundreds of billions of parameters across multiple accelerators. Compared to other comparable Amazon EC2 instances, Inf2 instances deliver up to four times higher throughput and up to 10 times lower latency. They also offer up to three times higher throughput and up to eight times lower latency than other comparable Amazon EC2 instances as well as up to 40% better price performance.

Inf2 instances are also energy-efficient, offering up to 50% better performance per watt compared to other comparable Amazon EC2 instances. This helps customers meet their sustainability goals while running generative AI inference at scale–and scale up easily when they need more power.

How Can Enterprises Use Inf2 Instances?

Enterprises can use Inf2 instances to run popular applications such as text summarization, code generation, video and image generation, speech recognition, personalization, and more. You can also run large, complex models such as GPT-J or Open Pre-trained Transformer (OPT) language models on Inf2 instances.

To start with Inf2 instances, enterprises can use AWS Neuron SDK, which integrates natively with popular machine learning frameworks such as PyTorch and TensorFlow. AWS Neuron helps customers optimize models for AWS Inferentia accelerators and run inference applications with minimal code changes. Enterprises can also use AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker.

The Pros and Cons of Amazon EC2 Inf2 Instances

Amazon EC2 Inf2 instances are purpose-built for deep learning inference. Powered by AWS Inferentia2, the second-generation AWS-designed deep learning accelerator, they are ideal for large and complex models such as large language models and vision transformers. Here are some of the pros and cons of using Inf2 instances for your inference workloads:

Advantages of Inf2 Instances

High performance and throughput. Inf2 instances deliver up to 4x higher throughput and up to 10x lower latency than Amazon EC2 Inf1 instances. They also offer up to 3x higher throughput, up to 8x lower latency, and up to 40% better price performance than other comparable Amazon EC2 instances.

Scale-out distributed inference. Inf2 instances are the first inference-optimized instances in Amazon EC2 to support scale-out distributed inference with ultra-high-speed connectivity between accelerators. Customers can efficiently deploy models with hundreds of billions of parameters across multiple accelerators on a single Inf2 instance.

Native support for ML frameworks. AWS Neuron SDK lets enterprises optimize models for AWS Inferentia accelerators and run inference applications with minimal code changes. AWS Neuron integrates natively with popular ML frameworks such as PyTorch and TensorFlow.

Energy efficiency. Inf2 instances offer up to 50% better performance per watt compared to other comparable Amazon EC2 instances. This helps you meet your sustainability goals while running generative AI inference at scale.

Limitations of Inf2 Instances

Limited availability. Inf2 instances are currently available only in four regions: U.S. East (N. Virginia), U.S. West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). Customers actively looking to deploy these new instances may need to consider data transfer costs and latency if they want to use them in other regions.

Limited instance types. Inf2 instances are available only in four sizes, ranging from 16 vCPUs and 1 Inferentia2 chip to 192 vCPUs and 12 Inferentia2 chips. Enterprises may not find the optimal workload fit and require more or less compute power or memory.

Limited storage options. Inf2 instances do not support local NVMe SSD storage or EBS-optimized performance. If this is a requirement, customers may need to use external storage services such as Amazon S3 or Amazon EFS for such enhanced storage data needs.

Looking Ahead

Amazon Web Services (AWS) is committed to innovating across chips, servers, and software so customers can run large-scale, deep-learning workloads. The launch of EC2 Inf2 instances powered by AWS Inferentia2 chips offers customers a high-performance, low-cost and energy-efficient option for running generative AI inference on Amazon EC2.

I expect to see announcements such as these today from AWS being replicated by the likes of Azure and GCP amongst others as enterprises look to make generative AI a more common part of their overall workload mix. The fact that AWS is early to market is not surprising.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

AWS Further Invests in the Australian Market

Southwest Airlines Adopts AWS Cloud to Enhance IT Operations

Marvell Boosts Cloud EDA Cause with AWS Selection

Author Information

Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the Vice President and Practice Leader for Hybrid Cloud, Infrastructure, and Operations at The Futurum Group. With a distinguished track record as a Forbes contributor and a ranking among the Top 10 Analysts by ARInsights, Steven's unique vantage point enables him to chart the nexus between emergent technologies and disruptive innovation, offering unparalleled insights for global enterprises.

Steven's expertise spans a broad spectrum of technologies that drive modern enterprises. Notable among these are open source, hybrid cloud, mission-critical infrastructure, cryptocurrencies, blockchain, and FinTech innovation. His work is foundational in aligning the strategic imperatives of C-suite executives with the practical needs of end users and technology practitioners, serving as a catalyst for optimizing the return on technology investments.

Over the years, Steven has been an integral part of industry behemoths including Broadcom, Hewlett Packard Enterprise (HPE), and IBM. His exceptional ability to pioneer multi-hundred-million-dollar products and to lead global sales teams with revenues in the same echelon has consistently demonstrated his capability for high-impact leadership.

Steven serves as a thought leader in various technology consortiums. He was a founding board member and former Chairperson of the Open Mainframe Project, under the aegis of the Linux Foundation. His role as a Board Advisor continues to shape the advocacy for open source implementations of mainframe technologies.

SHARE:

Latest Insights:

The Six Five team discusses Marvell Accelerated Infrastructure for the AI Era event.
The Six Five team discusses Google Cloud Next 2024 event.