Clicky

The Six Five at Cloudera Evolve 2022: How an Iceberg-powered Data Lakehouse is Disrupting Analytics
by Daniel Newman | October 31, 2022

The Six Five “On The Road” at Cloudera Evolve NYC. Hosts Daniel Newman and Patrick Moorhead are joined by Ram Venkatesh, CTO & Bill Zhang, Sr. Director, Product Management, Iceberg at Cloudera. They discuss Apache Iceberg and its benefits in a hybrid cloud-native environment, specifically with analytics.

You can watch the full video here:

You can listen to the session here:

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.

Transcript:

Daniel Newman: Hey, everyone, welcome back to another Six Five on the Road here at Evolve, New York City, brought to you by Cloudera, Intel, IBM. Excited for another great conversation. Joined by my co-host and, keynote, extraordinary today’s event, Patrick Moorhead. Hey, Pat.

Patrick Moorhead: Daniel, you’re too kind. No, it’s been a fun day, the content’s been great. Just talking to Rob one on one in our interview that we did, or two on one, I’m fascinated how far the company has come. Also, it’s been great to meet with some customers, get real insights. We’re live, we’re back, we’re not remote, we are here right now.

Daniel Newman: Yeah, it was interesting, though, because you and I as industry analysts spent what? We did 47, 48 weeks a year on the road in 2019, and then we went through the 2021. I saw them ask the question about how many people were back for the first time at a live event and half the room raised their hands. You and I have probably been to like 50 events since…

Patrick Moorhead: Exactly.

Daniel Newman: … the pandemic was somewhat marked as responsibly okay to be back out in public. So we’ve seen this sort of transition, but a lot of people are just getting out for the first time. This event is great, you might hear a little bit of the energy in the background, but very excited. We’ve got Ram and Bill joining us here to talk a little bit about Iceberg, talk about data. So just a quick introduction from each of you, Ram, I’ll start with you. Welcome to the show.

Ram Venkatesh: So absolutely. First of all, thanks for having me on the show. Ram Venkatesh, CTO here at Cloudera. So I’m responsible for our technology vision and strategy. You saw today earlier about the announcements we made in the hybrid space. So we are pretty excited by what we can do together in this space with data in the context of hybrid multi-cloud data management.

Daniel Newman: I love it.

Bill Zhang: So this is Bill Zhang. I’m a senior director and the product manager for Iceberg. So very excited to be here and to talk about Iceberg.

Patrick Moorhead: Now it’s nice to meet both of you. Great conversation in the green room or the green area, I guess, getting ready for this. But I have to ask, why do we need yet another tool like Iceberg? Don’t we have enough data? Something data X out there?

Ram Venkatesh: It’s a good question. Look, the way I think about this is over the last decade or so we’ve been taking individual parts of a monolithic stack and separating it out. The biggest one that everybody knows about of course is storage and compute. Why did we do that? Because then you could go scale each of these pieces independently and get value by doing that. There was a piece at the middle, that’s the piece that says, “Here’s the relational shape or definition for all of your data.” This is the table format. Previously this piece was always tied up with one particular engine or one particular store or one particular vendor technology. Now, this was always considered so niche to making things work that there was no way to separate it out. With Apache Iceberg this is the piece that we have decided that this is going to be a first class thing. But I like to think of as a disaggregation journey where by making this piece standalone, we can unlock a tremendous amount of value. That’s the reason why we are focused so much on Iceberg.

Daniel Newman: That’s interesting. Disaggregation patch. It seems to come up across every part of tech. You and I uniquely don’t focus as analysts on just one thing. We work within the semi space, we work in app space and we work on everything in between. We’ve just kind of continuously hear this, everything goes… It’s like a, yeah, it’s a… What do you call it? An accordion?

Patrick Moorhead: An accordion.

Daniel Newman: It’s the accordion. We’re back in the air of disaggregation. I think a lot of this has to do with the migration to hybrid. Because you can’t… What works for public doesn’t necessarily work for private and hybrid means you need to take kind of best of everything and in some cases architect completely new structures. Is that what Iceberg is? Is it really the tool for ushering in the hybrid era?

Ram Venkatesh: Yeah, Iceberg is the linchpin really that makes this cloud native hybrid architecture go. One of the cool things I think about this space is that hybrid is not a least common denominator. You don’t take the same thing and run it everywhere in the same way and hope that it all works out. That’s just too expensive, too slow, too clunky. So if you really want to be cloud native, I think that’s where having Iceberg be the thing that actually helps us decouple the table definitions from the underlying storage, lets us be hybrid, sort of have our cake and eat it too, that’s the thing that the promise of Iceberg really unlocks for us.

Patrick Moorhead: So Bill, what are some differentiators on your approach? I think you have a lot of firsts out there. I think we talked that you’re the first hybrid commercialized solution, but what else? Prior Iceberg drops have been single engine as an example. What are you doing across those lines to differentiate your product?

Bill Zhang: Yes, that’s actually a great point. First of all, we have integrated Iceberg into our Cloudera data platform. As you know that there are many compute engines within our Cloudera platform, Hive, Spark, Impala, NiFi, Flink. What we’re doing is actually enable all those multi engine being able to access in the process Iceberg table simultaneously. So this is one of the strongest point that I want to share with you and the share with our customers.

Patrick Moorhead: Yeah, that’s definitely unique and a real testament to the overall architecture of CDP. Almost like they thought of that when they architected it.

Daniel Newman: Yeah, THERE’S-

Patrick Moorhead: Yeah.

Daniel Newman: There’s so many different potential layers of abstraction to get value, but in the end, that’s what this is all about. So maybe as you’re sort of positioning this and I’ll let you both answer this at your own, because obviously working as CTO and leading product, it might be a little bit different, but I’m guessing that you’re out and about, you’re getting feedback from partners, you’re getting feedback from customers. What’s the reception been as you’re kind of out there introducing the Iceberg plus the whole hybrid approach? Are you getting a warm reception? Are you feeling people are going to be quick to adopt? What’s kind of the initial reaction?

Ram Venkatesh: It’s been overwhelmingly positive both in the context of hybrid and also the modern data architecture narrative. Customers really love the fact that now we are describing the solution in terms of the problem that they have. They have multiple engines, they want to run on premise, they want to run in the cloud. With these realities they’re looking for ways to make this be seamless, transparent, the lower cost footprint and that’s what Iceberg does for them. So, Bill, I want you to chime in on some of the customer stories that we’ve already seen in the short while that Iceberg is in GA on our stack.

Bill Zhang: Yeah, sure. What I like to do is actually talk about some of the three early adopters that we have. One of the customers actually our referencable customer Tarnet. So what Tarnet does is that they want to provide a near realtime analytics for their analysts. So previously they’re dumping the database log from their on-prem database to the cloud, but they can only do that once a day. Now, they want to actually increase the frequency of those transaction log dumping to the cloud so that they’re doing that once a day with our Iceberg technology. Now with once an hour… From once a day to once an hour now they’re actually providing near real time analytics for their analysts. Also some of the advanced capabilities such as partition evolution. So they can change the petition schema and the more finer grain so that they’re providing much better performance based on the data sets that they have. So this is the Tarnet example.

The other one that I want to share with you is with our global pharmaceutical company and what they want to do is actually providing a forensic analysis for their… Who actually access data sets a certain point. Now, one of the key capability we’re providing through Iceberg and our CDP integration is what we call “Time travel.” So through time travel capability there will know exactly who had access what at a certain point. So that’s the point in time forensic analysis.

The third example that I want to share with you is with a global automotive company. What they want to do is actually they want to also have those different vehicles and each vehicle will actually sending different logs to their database and that they want to do some deep analysis to help them to do predictive maintenance, to do customer behavior analysis. So all those information is actually in the log that they transmitted from the vehicle. So they want to actually do near realtime as well because there’s a lot advantage of being in real time. So the feature they’re looking for is again, some of the partitioning evolution that I just mentioned earlier, is they can actually change the partition to much finer granular detail so that improve the performance. So from their early analysis they’re seeing that they can improve the performance spend tenfold, tremendous performance improvement and that they can provide the realtime analytics for their use cases. Predictive maintenance, customer behavior analysis.

Patrick Moorhead: Yeah. What’s really… Well first of all, I’m impressed because most announcements, the way they go is you announce one year and then you have POC, maybe two or three customers that show up and then you go GA your GA on Iceberg, right?

Bill Zhang: Yes.

Ram Venkatesh: That Is correct. In fact-

Patrick Moorhead: That’s good. Very different by the way, from what we’re used to.

Ram Venkatesh: I think the pandemic helped us a little bit in this case. Because we’ve been working on this technology now for about 18 months. We’ve been active in the Iceberg community. That’s one of the things that we should talk about. Iceberg is not so much a… It’s not a Cloudera thing. The fact that there is a community, an ecosystem of partners building together, this all helps. So there’s momentum there where we invest our development resources into Iceberg. But there’s other large players in the community.

Patrick Moorhead: Well, announced them in a big announcement yesterday.

Ram Venkatesh: Exactly.

Patrick Moorhead: A big public cloud provider.

Ram Venkatesh: Correct.

Patrick Moorhead: Then? Right.

Ram Venkatesh: Exactly. Right. So this lets us sort of bring this innovation faster to the market. Now, what we can do as being good stewards to the community is that we can bring our customer use cases, we can bring the enterprise readiness that they need to go with this correctness. All of the things that are sort implicit in some of the customers, like the ones that Bill is talking about for them to deploy at scale. So being able to do that in the context of a community that’s been helpful in bringing this technology to the market as fast as we could.

Patrick Moorhead: Yeah, for sure.

Daniel Newman: So what are some of the… As we tie off this conversation… By the way, you guys, great job, really appreciate, this is hard stuff to explain. I mean, theoretically everybody hears the anecdotal, “What can we do with data?” But now you really are getting into the guts of all the complexity, multiple systems, sources, structured, unstructured, edge, cloud… This is a hard…

Patrick Moorhead: There’s a lot to pick in, yeah.

Daniel Newman:
What is the risk? Where are you finding…? Are customers…? What are they most apprehensive about in this migration to say Iceberg? Are you running into…? What are the hurdles that are coming up?

Ram Venkatesh: So, pretty simply, there’s three things that we want to be very mindful of. The first one is around correctness. Any data system you have to be able to trust the results that come out of that system. If you cannot do that, the rest of it is moot. So with Iceberg, this is where having someone like say Netflix has been running Iceberg at scale, hundreds of petabytes for the last four years, this is huge. This is the sort of… There’s the right metaphor for this topic would be the stuff below the waterline, where somebody is running at scale in production where they can rely on the data, the results that you’re getting from that system. That’s the fundamental risk that’s usually associated with bringing a new technology like this to the market. I think that this is where the way Iceberg came together, we have a lot of confidence in the fact that it’s had a lot of road miles. It’s had a lot of proof points that we can point to and say, “This is why we think this is a good solution.”

Daniel Newman: By the way, very consistent with some of my earlier interviews. You weren’t here for one I had, but we actually talked about… I used the metaphor of a skyscraper because I basically said, “All that plumbing, you see these buildings here in New York come out of the ground but they’re built… Their foundation has to be deep.

Patrick Moorhead: They are built deep into the ground.

Ram Venkatesh: Yeah.

Daniel Newman: And that’s the data management and all the hygiene and the compliance and residency and sovereignty and architectures. You can’t go to Iceberg if you haven’t done that foundational stuff correctly. I think so many people want to get to that end state. Everybody wants to, “How do we quickly? Get insights from our data?” Well, the companies that are doing it well, most of them have been doing it for decades. They’ve had good behaviors and habits and of course companies like yours can help them speed that up and help them do better and give them best practices. But a company that has a mess of a data ecosystem can’t just be like, “Oh, we’re just going to go over to Iceberg and throw some stuff in the public cloud and we’ll be fine.” That’s not how it works.

Ram Venkatesh: Yeah, that’s not a recipe for sustainable success. That’s the thing is that I think the effects of being a data driven business, they really start to pay off if you can build on top of the successes you have had, it’s like compound interest. So if you have two use cases today and you get to five the next year and ten or a hundred the year after, that’s when you really start to see this exponential value. The automobile customer that Bill is talking about, they started off with one single customer, 360 use case seven years ago. Right now they have more than a hundred. So I think that is where we see the power of the platform and being able to have additional ways to derive value from data faster. That’s really key for companies to be able to monetize their data effectively.

Daniel Newman: Love compound interest.

Patrick Moorhead: No compounding interest is great.

Daniel Newman: I know.

Patrick Moorhead: I think this is a good place to close off the conversation here Ram and Bill really appreciate your time here. And by the way, congratulations on being first to market with a hybrid Iceberg solution and also on this multi-engine. I think that’s great and it’s always good to be first to market, but first to market but Bulletproof and industrial strength. Because see a lot of people who do press releases and it’s three years out. So literally customers can do Iceberg today as long as they’ve got good hygiene and a good data architecture, right?

Bill Zhang: That’s right.

Patrick Moorhead: Okay.

Bill Zhang: Exactly.

Patrick Moorhead: Okay, good.

Bill Zhang: You got it.

Patrick Moorhead: Your heads are nodding. My head’s nodding. This is great. So I want to thank you for coming on. Really exciting journey, educating so many people about Iceberg and we’d love to have you on in a little bit to get a pulse on how it’s doing and about the growth of it.

Ram Venkatesh: Absolutely. Thanks for having us in the show.

Patrick Moorhead: Thanks.

Bill Zhang: Thank you.

Patrick Moorhead: So this is Pat Moorhead with the Six Five signing off with my incredible co-host Daniel Newman and incredible Bill and Ram here. We hope you like the show. If you like what you heard, hit that subscribe button. We are signing off here from New York City, Evolve 2022. Have a great morning, lunch, evening wherever you are. Take care.

About the Author

Daniel Newman is the Principal Analyst of Futurum Research and the CEO of Broadsuite Media Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise. Read Full Bio