We are LIVE! Talking with AWS’s Dave Brown on re:Invent Updates to Graviton, Inferentia & Trainium – Six Five Insiders Edition
On this episode of The Six Five – Insiders Edition Patrick Moorhead and I are joined by Dave Brown from AWS for a conversation around the announcements out of re:Invent. As always, it was an awesome conversation and one you should definitely check out.
Our conversation with Dave revolved around the following:
- A look at the latest Graviton offerings
- A recap of the Graviton Challenge
- The third party support coming for Graviton
- What made AWS create first-party silicon for AI and Machine Learning
- Real world use cases for Inferentia
If you’d like to learn more about the announcements from AWS re:Invent, listen to the full episode below. While you’re at it, don’t forget to subscribe to The Six Five podcast so you never miss an episode.
Watch our interview with Dave here:
Listen on your favorite streaming platform here:
Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.
Daniel Newman: Everybody, welcome back to another edition of The Six Five Podcast Insider Edition. I’m Daniel Newman your host, principal analyst at Futurum Research. Joined by my esteemed co-host partner, Patrick Moorhead. Patrick, how are you doing today?
Patrick Moorhead: Daniel, I’m doing great. It is AWS reinvent time. I mean, if you are any way, any shape, any form interested in the cloud, you are paying attention to this amazing event, here in Vegas.
Daniel Newman: Absolutely Pat, so exciting to be here. Getting all of the news out of this huge event and by the way, being here, just saying that, being here at the event, getting all the news. So, that is such a big change. Got really comfortable in doing all the events remote, but having the chance to get out to one of the most important events, as you mentioned in cloud, but obviously now, so much more than just cloud. I’m just ecstatic and there’s going to be so much to cover and we’ve got an amazing guest today.
Patrick Moorhead: I know. Yes we do. And I would like to welcome Dave Brown. Dave, how are you doing?
Dave Brown: I’m doing well, thanks Pat. How are you?
Patrick Moorhead: Good, huge day. Here, Adam had an amazing keynote here and he hit a lot of different elements, but a lot of things in regarding EC2 Silicon and, Amazon has been just a huge player, in not only other people’s Silicon, but as of late your own Silicon. I mean, we have next generation Graviton 3, we’ve got training. We’ve got Inferentia. Can you tell us about what’s going on here? Let’s start with the latest Graviton offerings.
Dave Brown: Oh, absolutely. And pat, that’s really great to be again. It’s a big day for us. As you know, we continue to push the price performance envelope, on our Silicon offerings. Last time I was on the six five keynote with yourself or someone with yourself. We were talking about Graviton 2 and the processor and just what we’ve been able to do there. And over the last little while, we’ve actually launched 12 EC2 instances now, powered by AWS Graviton 2 processes. And so the adoption there has just been great, both across just the breadth of workloads, everything from compute intensive to memory intensive workloads, even storage, intensive workloads, but also just the customers that are adopting us, from small startups to large enterprises. And as you just mentioned, obviously in the key keynote earlier today, we announced three new Graviton instances or three new instances based on Graviton 2, and that’s our storage optimized instance.
Two of them, the Im4gn and the Is4gen instances. And then also our very first GPU based Graviton 2 instance, our G5g. Which actually brings Graviton 2, together with a NVIDIA GPU. So we’re pretty excited about that. And then we also announced a new addition to our nitro family, called the Nitro SSD, which is actually used by the two new Graviton storage instances, to bring even better IO performance, slower latency. And what’s really important for customers is just minimal latency variability. So just real consistent, low latency for SSD performance. And so, just excited about all the announcements as you called out. And then obviously, the announcement of our new Graviton 3 chip as well, server chip, which is really exciting to have out there.
Daniel Newman: Yeah. And we’re going to give you a chance here today, Dave to talk about all of these. I think we’re going to want to break these down a little bit, piece by piece so that you have a chance to dive in a little bit. For instance, one of the exciting additions beyond the new EC2 instances was, we’re seeing in addition to your managed services offerings coming from AWS supporting Graviton 2. Can you talk and tell everyone a little bit more about that?
Dave Brown: Yeah, absolutely. So, we’ve been really focused on making Graviton 2, easy for our customers to use. And one of the best ways for us to do that, is to offer Graviton 2, with other services. And so today managed services, for databases, analyst, analytics, even in the serverless space with Lambda, Amazon Aurora, ElastiCache, EMR Fargate, they’ve all actually support Graviton 2, today for customers. And so it’s very, very simple for our customers, to simply launch a RDS database on Aurora database or create a Lambda function and they get the benefit and their price performance benefit that comes with using Graviton 2, both directly into the managed service that they’re using.
Patrick Moorhead: Yeah, It’s great stuff. And it’s just so awesome to see Graviton 2, moving up the food chain, from where you started with Graviton 1, kind of in a limited function. And then now it’s like a, full service instance at this point, and you’re also trying to make it easier for customers to move workloads or even try workloads out, at the Six Five summit this year. Thank you for announcing the Graviton summit. And thank you for letting me be a judge for it, as well. That was a lot of fun, but, how did it go? What can you share about it?
Dave Brown: Well, Pat has been really fun and, it was really great to announce the Graviton challenge at the Six Five summit and, the response has just been amazing. Now, we’ve been on this journey with Graviton 2 as you said, for about three years now. And, initially honestly didn’t know how graviton or even arm in the cloud was going to go. And the response has just been, incredible to see. And the Graviton challenge, was really designed to help developers move their first workload, to a graviton based instance. And what it does is, provides sort of a blueprint or a step by step guide, on what they need to do, to actually adopt it. We wanted to make it fun as well for developers. And so, we ran this little contest and hackathon, which thanks for being a judge in that as well.
Where these developers could compete for prizes, by building and migrating their applications to run on Graviton 2 based instances. And so we had more than a thousand developers actually take part. Again, it was from large enterprises to small startups and we even had a few individual developers and open source developers as well, take part in the challenge. And so, we’ve also recently just announced the winners of that, at Reinvent. And just to call out a few, we had the best adoption in the enterprise category, went to VMware, a team internally, they called the vRealize IT team.
And they migrated 60 microservices and saw up to 48% latency improvement with a 22% cost savings for their workload. And then we actually, in the startup category, we had an entry called Chasm technologies, and they see saw 48% better performance and 25% potential cost savings for their container streaming platform. So just incredible to see. And, even though the contest is over, we just announced the winners, we’re actually going to keep the Graviton challenge going. And that’s a four day plan to help developers move to graviton. So that the plan’s still there, obviously the excitement and buzz around the competition has now been over, but really good to see.
Daniel Newman: Yeah, we were so proud to be part of that and to have the Six Five summit be the kickoff for that. Now Dave, we’re going to have to ask you start thinking about what the next big launch might be, at our summit next year. That gives you a few months to think about it. And of course, Pat and I would always ask. But you know what, in addition, to customers, and you mentioned a lot of things that were exciting about the adoption, you guys have seen really strong, widespread support for graviton from third party software providers as well. Can you share a little bit more about what’s going on there?
Dave Brown: Well, that’s right. And it comes back to what Pat was saying earlier about making Graviton easy to use. And, we always knew before we put a Graviton chip out there, we had to make sure that there was great ecosystem support for, on based applications in our on-based processes. And that was actually the reason why we put out the initial Graviton processor back in 2018, was really to spark that ecosystem support. And so, to date we’ve actually seen incredible response from that, with all of the major Linux based operating systems and software provided services, for things like containers and monitoring and management, security, even development software for supporting across the board, really what our customers are looking for. They’re all offering support for Graviton based instances.
In fact, it’s fairly uncommon for me today, to run into a customer that’s still looking for something key, before they can migrate, the vast majority of it is there. And actually at Reinvent, we also announcing what we call the AWS Graviton ready program, which really targeted at software partners. For them to actually be able to offer certified solutions to our Graviton customers and for our Graviton customers to know which applications are there, our Graviton ready and so we’re excited to launch that program as well.
Patrick Moorhead: Yeah, it’s funny the way that I tell the story about arm instances was that, okay, we were about 80% before you got on board. And, I would say within a year, you went from 80 to a 100, and the ease, I was pretty surprised, particularly when it was, a modern web app, sorry, a modern cloud app and how quickly you could make the transition. I actually went through all the documentation. I’m a little bit nerdy like that, but pardon me. So let’s get back to the announcements. One of the biggest announcements at Reinvent was Graviton 3 processor. Can you talk a little bit about how Graviton 3 compares with Graviton 2?
Dave Brown: Absolutely. And this has been a huge announcement for us, and we’ve obviously been working on this for some time and really excited, not only to see the process of working and the instances coming together, but also to be able to announce it at reinvent here today. So, we announced Graviton 2 back in 2019, and that brought with about a 40% price performance improvement, over what you could do in the cloud at that time. And Graviton 3 actually is, continuing to push the performance envelope even higher. And we’re actually seeing announcing today is Graviton 3 is 25% higher on compute performance, when compared to Graviton 2. And so, you think about the benefits in price performance that customers have seen, just with Graviton 2 and Graviton 3 gives you an additional 25% performance on that. So it’s just, we’re continuing this journey of really pushing performance, for our customers.
Some of the more detailed numbers as well is we’ve seen 2x higher floating point performance, which is really important for scientific and some machine learning or media and coding workloads. We’ve seen two times faster performance for cryptographic workloads as well, which has been an area of focus for us with Graviton 3. And it also delivers three times the performance of Graviton 2, for machine learning workloads by supporting beef load 16 support as well. And then obviously as with Graviton 2, there’s been a big focus on reducing our carbon footprint. And so with Graviton 3, we’re actually using now 60% less energy to achieve the same performance or level of performance as you would get with some other comparable EC2 instances. And then also we actually, it’s the first instance in the cloud to offer DDR5 memory.
So, as you know most instances today use DDR4, with DDR5 we actually get 50% percent better memory bandwidth. So the process is able to talk to the memory 50% faster that it was able to do with DDR4. And then we always are focused on security and, Graviton 3 brings some additional enhancements in the security space, including support for pointer authentication, as well as 256 bit DRAM encryption, to ensure that all memory is encrypted at all times.
Daniel Newman: Yeah, you absolutely hit the, I want to say tri-fecta, but it’s like the five-fecta of, performance. The overall performance of course, but the energy efficiency, which is very important to many companies, including AWS has recently made several announcements, stepping up to do more to address and help companies be accountable for their carbon footprints. Helping companies do more the memory that they have access to. So, it’s really a strong set of upgrades and updates. And of course that doesn’t take anything away from Graviton 2.
Now, from my standpoint, I always like taking the technical to the practical. And so my question’s pretty simple, but as people are thinking about migration and upgrading and adding and spinning up more workloads are on AWS, what are the workloads that you’re going to be really pushing and recommending the customers are going to build out from the ground up? And then of course, move over to Graviton 3.
Dave Brown: Yeah. Well, the first instance we are launching with Graviton 3, that we just announced at Reinvent is called our C7g instance. And that is in the category of what we call our compute optimized intent instance. And so it’s really designed for compute intensive workloads and, while many customers will try it out, I’m sure for any application from basic web servers to, anything that they may want to see how our Graviton 3 works for them, which is perfectly fine to use for. It really is targeted at some of the higher performance workloads, such as HPC or high performance computing, a gaming video encoding, or even CPU based machine learning inference, is what we expect to see customers use it for. Again, pretty much anything that needs, Graviton 3 CPU with a relatively small amount of memory is what these compute intensive workloads are targeted at. It would be good for that workload.
Patrick Moorhead: It’s exciting stuff. I’ll tell you what you’ve definitely taken it to the nth degrees, some of those workloads, that may not have been the perfect fit for Graviton 2. You’ve opened that up. And I really like, not only did you improve core general purpose compute, but also with your fixed function accelerators to attack encryption and things like machine learning. So, great stuff. Let’s move to AI and ML. And I think the best way to start this is, what was the driving force between, that really drove you to get into making your own first party Silicon for AI and ML?
Dave Brown: Yeah. It’s a great question. I mean, the way it really started is just, the adoption and what our customers are starting to do with AI and ML, has been increasing year over year. It was just three or four years, maybe five years ago where you’d speak to customers who had never, never tried ML, never tried AI, within their company. Today, I mean, I hardly ever speak to a customer that isn’t doing some form of AI and ML, from financial services to healthcare, manufacturing, retail, it’s really across the board. And they’re realizing that they need to be in the AI ML and even the deep learning space, to be able to remain competitive and provide their customers with a really enhanced experience. One of the big challenges is as you get into AI and ML is really the cost involved.
And so there’s really two sides to AI and ML. One is the training, where you actually train models, prepare models to be deployed to the real world. And then, inference is the part of ML or deep learning where you actually apply those models. And the cloud itself has just been an amazing enabler for AI and ML to really take place. I mean, I think we wouldn’t be where we are today, had our customers not had access to high performance computing, the high speed networks, the vast amount of storage that we offer in the cloud, and just being so easily available in an on demand fashion and globally, has really allowed it to really just explode.
And so when we look at customers running AI and ML, one of the common things we see from them is they say, I’d love to be able to do more of it. It’s made such an impact to my business, to my customer experience. I need to be able to do more, and the thing that’s really slowing them down in some cases is just the cost of actually running these training models or running inference. And so, that was the place we looked at first. We said, how do we actually, allow our customers to do more AI and ML by improving performance and reducing cost. And that’s when we started looking at, is there something we could do in the custom Silicon space?
Daniel Newman: So I have to ask you then, and this is pretty straightforward. I think you set us up really nicely, Dave. But, how are you solving this? Because you’re exactly right, this is the same problem we hear about on the regular, we’ve written about it endlessly. And it hasn’t been entirely solved yet, but it sounds like you’re on track.
Dave Brown: Yeah. And, so we started this journey again, it was back in 2020 at Reinvent, just prior to that, we actually announced our first machine learning chip, which we called Inferentia. And it was targeted at inference. And as I said, inference is really that process of, real time analysis of incoming data, and then being able to respond to that. And, we started there, because about 90% of the cost of ML actually goes into performing inference, for a lot of our customers, not all of our customers, some have a little bit of a different waiting. But, that’s where we see a lot of the cost involved. Thought if we could build a chip that could reduce that, we could make a significant impact. And so Inferentia was designed to deliver high performance and throughput, it’s really needed for machine learning inference and can do that at actually significantly lower cost than what you could do with GPU based instances on EC2.
And in some cases actually giving up to 2.3 times higher performance, and actually even at 70% lower cost than you could do with other options. And so it was just a really, really great one for us, to bring that to AWS. So that was a custom chip that actually had 4 neuron cores, and sort of was just very, very good at that matrix math, that needs to be done to do ML compute operations. So we pulled all of that together, very happy with the performance that we saw. We wrapped up the actual in Inferentia chip in an instance called the Inf1, which allows you to use 16 in Inferentia chips, inside a single instance, and perform up to 128 trillion operations per second, which again is bringing that performance, and then obviously at a lower cost as well.
Again, just like with Graviton, we spoke a little bit about making it easy for customers to use. And we built our… We delivered our SDK called the AWS Neuron SDK, which makes it really simple for developers to go from a GPU based inference model to Inferentia, using frameworks like TensorFlow and Pytorch, which utilize AWS Neuron SDK. So the migration’s actually been relatively simple for developers to move over, and then they can immediately begin to reap the benefits of, both the performance and the cost reduction that Inferentia provides.
Patrick Moorhead: You had, some really interesting conversations with what could be the world’s largest installation of machine learning imprints and that’s Alexa. And I just love the fact that you’re using your own technology to do that. I was fascinated with their conversation of, how they moved it over, over time. Why they chose you, and it was kind of funny. They even pointed out, we don’t have to move over there, we’re moving over there because, we’re saving a ton of money and getting great performance out there. Aside from the Alexa folks. Can you talk a little bit about how customers are using Inferentia?
Dave Brown: Yeah, Absolutely. Alexa has been a great customer for us and they’re right. We treat them like any other customer internally, and so we really do have to win them over, which has been really good. We did have another internal customer, Amazon prime video and they actually deployed their computer vision applications, for sort of image classification in live video using Inferentia. And they actually saw a four times higher throughput, after going to inferentia at 40% lower costs. So it was a enormous win for them internally, but we’ve had a lot of external customers using inferentia as well. Airbnb is a really big customer of ours and they actually deployed their community support platform, which deliver as exceptional service to their guests and hosts around the world. And what they were looking for was ways to improve the performance of their language processing using the BERT model.
And so, for their chat bot application. And we were very pleased to see that by migrating their applications from previously GPU based instances to using our Inf1 instance, they were able to see a 2x improvement in throughput. And then we also had sprinkler AI, they have a unified customer experience management system that really allows companies together and translate realtime customer feedback, across many, many channels. And then what it does is, it turns them into actionable insights. And by using Inf1, they were able to get significantly better performance for one of their natural language processing, so NLP models. And improved performance of one of their computer vision models. So it really has been across the board. It’s always fun to see these stories of customers who start out, move over relatively quickly and then begin to share some of these wins with us. We’re always excited to hear.
Daniel Newman: Yeah. And, I want to turn this now, because it’s great hearing about how you’re putting inference to work on inferentia. But of course you guys have some exciting announcements too, around training an area that, of course I think the market’s very open to more solutions and seeing more opportunities, you mentioned the size of training workloads is growing, the complexity is growing. So talk a little bit about how Trainium based Trn1 instances are helping customers with challenges that they have with model training.
Dave Brown: Yeah, that’s right. And as you say, Adam announced the preview of that with the EC2 instance powered by Trainium, our new instance called the TRN-1 to sort of compliment our one instance. And that is powered by the Trainium ML chip. We did pre-announce Trainium at Reinvent last year, but this is the first time we’re seeing customers actually get the hands on that Silicon, and they’re very excited. Just like inference, ML training can be very expensive for customers. And it’s really an iterative process often together with data scientists where, they have a model, they have some data that they want to be able to train the model on. But then, they really need a lot of high performance compute and to do a whole lot of parallel processing on this data to really get that model to a place that needed to be.
And so it normally takes some time. So time to train is a very important metric for our customers. And obviously the cost to train is important as well. And one of the things we’re seeing is customers are constantly gathering new data. And so when you get a hold of new data, you want to be able to go back and retrain that model. And if that’s going to take you a long time to do, what’s going to cost you an enormous amount of money. Well then your model over time, sort of degrades because it’s not taking into account new data and it’s not learning. And so with Trainium based Trn1 instances, we’re actually delivering the best price performance for the training of deep learning models in the cloud. And so we also have the ability to speed up with highly paralyzed math operations, with the highest amount of compute power to train these ML models.
And we actually also, we’re doubling the networking throughput. So we’ve gone from, what’s typically been 400 gigabits per second on our GPU based instances. And with training, we are offering 800 gigabits per second, which is giving you ultra-high-speed throughput, both on the network. And then also interconnect between the training chips. So really trying to bring down the latency and provide the fastest ML training available in the cloud. And with that high speed networking, one of the things we’ve seen is, customers create what we call ultra-clusters, where they actually take tens of thousands of these Trainium accelerators and bring them together using this 800 gigabit networking to create a Petabyte scale, non-blocking cluster, essentially with an enormous amount of networking.
And so these, essentially many supercomputers that can dramatically, further reduce the time to train some of these complex models. And then, as with inferentia, we’ve put all of this behind the Neuron SDK. And so now can, use the Neuron SDK. You can also use some of the common sort of libraries like Megatron ML or DeepSpeed for efficient distributed training of these models. So pretty exciting what we’ve been able to put together.
Patrick Moorhead: I’ll tell you, I saw people’s head explode virtually when you announced Trainium because quite frankly, it’s a lot of companies trying to do this. And during the announcement when that number came out, everybody wrote that down and you’re reiterating, that same performance and, cost number here. And what I’ve been really impressed with is that, as over time, when you’ve made these commitments, you’ve stuck with them. I went back and looked at notes on the commitments you had made around Graviton and wondering.
Okay, is this going to be true just for launch? And then I checked on Inferentia just for launch, that 40% number and sure enough, as other solutions decreased in price, so did yours. So I am glad to see that is just not for marketing, but you’re sticking to your positioning. I know how hard that is being a prior product person.
So, we’ve talked about general compute with Graviton and how you’ve upgraded that. And by the way, how you’ve upgraded Graviton too, as well. Really making it that first class citizen for general purpose. We’ve talked about Inferentia and how much progress you’ve made there. And we’re talking about Trainium here for this preview. Net net, you’re talking to your listeners, how do they get started with AWS Silicon innovations?
Dave Brown: Well Pat, you’re completely, right. I mean, our focus in this area is to continue to innovate on behalf of our customers. And there’s nothing that drives us more across my teams in EC2, than making sure that we can provide increased performance at a lower cost. And we really call that the price performance equation. And that’s, what’s got us into the Silicon space. You saw us do that with Nitro, all the way back in 2012, when we started to do our own cards, our Graviton processes, and then obviously with ML chips now as well. The Simplest thing for customers to do to get started is really just to try it out. I speak to a number of customers all the time that say, Hey, I’d love to move to Graviton, but I’m not sure if my application can support it.
And this is where the Graviton challenge came from. I said, well, give it a try. And, it’s just to see the progress they’re making and the same is true on the ML side. If you’ve built something with one of those frameworks today, like TensorFlow or Pytorch, give it a try, just move it over, let it run on Inferentia or try Trainium out now on the Trn1 instance. And I think you’ll be pleasantly surprised at how simple it is and how quick it is, to actually go and improve performance and then save a whole lot of money on as well, while you’re doing it. So, that’s the biggest message. Just take a few days, take a few engineers.
Patrick Moorhead: Yeah.
Dave Brown: And see what you can actually do on this new Silicon.
Daniel Newman: Dave, that’s a great way to wrap up this show. At least the interview. We know that this week probably has a massively full slate for you. So we’re extraordinarily grateful that you took the time. And of course, Pat used the word making people’s head explode. I think we regularly get the drink from the fire hose. But I really do have to say that AWS probably is the penultimate example of an event where the announcements are that fire hose in terms of so much. So being able to Synthesize this, Dave, take this down to 30 minutes, get the biggest announcements, put some practical used cases around it and give some people, a way to move forward is all that we could ever hope for and ask for, and exactly why we try to do these Six Five interviews. So thank you so much.
Patrick Moorhead: Yeah. Appreciate that.
Wow. You know, it’s so funny just when people thought silicon and it was getting boring like nine years ago, right? Remember it was softwares eating the world and it’s like, no. It has to run on something. And I’m a huge believer that, you commoditize yourself. Okay. So in other words, there are things you can do as an industry and a company to de-commoditize yourself. And that’s what I love about what Amazon has done with their own Silicon. And, if you want merchant Silicon, they have that too.
Daniel Newman: Yeah, I’ve always been really fond of companies that understand the economics of choice. And that’s something that AWS has done extraordinarily well. It’s really… You understand what you need, we will build it and give you a lot of different options and flexibility. And depending on what you’re trying to accomplish, we’re going to accommodate that. And this is just one more example of diversification in the product line to enable enterprises, service providers, cloud scale. Whatever it is you’re doing to be able to achieve the outcomes that you’re trying to build. And in the world of where we talk a lot about hybrid and multi-cloud, where we’re also in a world of hybrid and multi-silicon.
And that’s what we’re seeing here is companies are diversifying and AWS continues to prove that, your and I’s thesis of Silicon, will eat the world is actually more accurate than software will eat the world, because software has to run on something. But wow, Dave Brown, what a great interview Pat, and what a great opportunity for everybody that either wasn’t at the event or is just looking for a way to synthesize all of the announcements, especially around AWS’s chips to be able to do so in a really meaningful way, in efficient way as well.
Patrick Moorhead: Yeah. So I want to thank for everybody for tuning in, here at the Six Five Insider with Dave Brown, from Amazon, AWS re:Invent.
Daniel Newman is the Chief Analyst of Futurum Research and the CEO of The Futurum Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise. Read Full Bio