Cloudera Apache Iceberg
by Daniel Newman | July 13, 2022

The Six Five team discusses Cloudera Apache Iceberg.

Watch the clip here:

If you are interested in watching the full episode you can check it out here.

Disclaimer: The Six Five Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors and we do not ask that you treat us as such.


Patrick Moorhead: So Daniel, you wrote a nice article, or one of your analysts did, Ron did, on exactly what’s going on here. So let me first talk about what Iceberg is. So Iceberg is an open source. Basically it’s a table structure that’s open that essentially allows, whether you’re using Spark, Trino, FLaNK, Presto, Hive, Impala to work on these same tables. So think of this as an industry standard table methodology that you can plug all those different tools on top of.

While there have been bigger announcements this year by Cloudera, I like this one because it’s pure to their strategy, which is to take open source technologies, I would say, put enterprise grade quality and stability behind it, and bring it to the biggest enterprises with the most amount of data out there. So it’s true to the strategy and this is all about data, which Cloudera is all about data. And whether you want to manage that data end to end on prem, or in the public cloud, or even managing data that comes through a SAS application, Cloudera’s doing a good job pulling it all together. Kind of one stop shop for data management.

Daniel Newman: Yeah, I think that’s exactly right, Pat. I mean, this is the hot new era of Apache Iceberg, and we’re seeing it talked about quite a bit if you’re in the data space. And Cloudera’s got this ecosystem and openness approach that it’s focused on. And right now, as competition comes at Cloudera and all the legacy and traditional big data warehouse and data lake players, it’s important for Cloudera to continue to innovate, Pat. And to innovate and deliver both openness and interoperability, and I think that’s what they’re doing. I mean, they’re working across the Apache portfolio, and they’re focused on, like I said, openness and the evolving requirements that their customers are seeing. The integrations, the data warehouses are significant. They’re working with Oracle, they’re working with IBM, and Netezza, Teradata, and they basically are functional in the multi-tenant environment that most companies are running in.

So the Cloudera challenge is that, after it went private it’s been a little quieter, a bit out of the center of the news. But again, going back, Pat, to when they made the decision to go private, I believe it was fundamentally decided so the company could reorganize, recalibrate, work on innovation, work at a pace that fit the company’s long term strategy. And I think that they’re doing that. Is that going to happen overnight? No. I mean, it’s going to take some time. It’s going to take some work. But I do think that what CDP is building is going to basically get them over a hump if they continue to push forward with that hybrid mentality, and with that more SAS based approach that they’re trying. They’re trying to make Cloudera more digestible.

They’ve already won that top of the market data world. But what they’re trying to do is say, hey, how do we compete with the hyper scale cloud data offerings? How do we compete with some of the born on cloud data warehouse, data lake solutions. And I think that’s what Cloudera is doing. Again, we aren’t going to have as much evidence as we used to have because we’re not going to get the same reporting metrics as we once did, but I like what they’re doing. I like that they’re aligning with the most important technologies for data tech, data warehouse, data lakes, and I like that they’re open and integrate with all the big data warehouses on a global basis. So good move forward. A lot to watch here. Again, we’re going to have to read between the lines with Cloudera going forward, but progress seems to be moving in the right direction. And keep talking about that here, Pat, as that becomes a bit more evident with, hopefully, the customer wins and other data that you get when you don’t get the earnings data.

Patrick Moorhead: Yeah. And by the way, one thing I’m mistakenly glossed over was that this is the first open data lake house that’s available, if you define open data lake house of having an open table set. Firsts are important in the industry, if nothing else to reinforce your leadership. So Cloudera Apache Iceberg goes GA.

About the Author

Daniel Newman is the Principal Analyst of Futurum Research and the CEO of Broadsuite Media Group. Living his life at the intersection of people and technology, Daniel works with the world’s largest technology brands exploring Digital Transformation and how it is influencing the enterprise. Read Full Bio