The News: IBM announced a new Lakehouse as part of watsonx.data at their Think conference. Read more here.
IBM Lakehouse as Part of watsonx.data for AI and Analytics
Analyst Take: IBM has announced the IBM Lakehouse as an evolution beyond the first generation Lakehouses in use today.
For those not familiar, a lakehouse is a mashup of data lake and data warehouse and is used as an analytics repository. More than a database and more than a data lake of unstructured file data, a lakehouse brings a schema to data for fast selective access to potentially massive amounts of data. Lakehouses are primarily used by query engines such as Presto and Apache Spark. Obviously, a lakehouse is serving a huge and growing market for analytics, used for Artificial Intelligence (AI) and Machine Learning (ML). Adoption of AI and ML are driving the need for more and more data with faster access.
The big news regarding specifics for the IBM Lakehouse include:
- Can be deployed in less than 10 minutes
- Will work in public cloud or on-premises
- Offered also with an on-premises integrated appliance – the IBM Storage Fusion HCI
- OpenShift on bare metal
- Standard x86 servers
- Nvidia GPUs
- IBM Storage
The IBM Lakehouse supports the Apache Iceberg format for query engines. This enables:
- SQL access for large data sets
- Multiple simultaneous query engine access
- ACID transactions support (Atomicity, Consistency, Isolation, Durable)
For performance improvement, a global persistent cache is implemented – every query engine can access the same cache across hybrid clouds or on-premises.
Storage includes object storage with S3 protocol, IBM Cloud, and Google Cloud Storage.
Lakehouses have a big future and IBM has pushed forward with their announcement with key capabilities for customers. They have recognized that on-premises is important as well as public cloud with the customer concerns on data privacy and storage costs. They also understand how complex a container environment can be and the packaged IBM Storage Fusion HCI will be a rapid deployment solution. IBM is a major player in data for analytics – AI and ML and IBM Lakehouse will enhance the completeness of their offerings.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
AI-powered Bing Now in Open Preview as Platform Continues to Grow and Evolve
AI is on Fire—New Broadcom AI Fabric Aims to Stoke the Flames
Author Information
Randy draws from over 35 years of experience in helping storage companies design and develop products. As a partner at Evaluator Group and now The Futurum Group, he spends much of his time advising IT end-user clients on architectures and acquisitions.
Previously, Randy was Vice President of Storage and Planning at Sun Microsystems. He also developed disk and tape systems for the mainframe attachment at IBM, StorageTek, and two startup companies. Randy also designed disk systems at Fujitsu and Tandem Computers.
Prior to joining The Futurum Group, Randy served as the CTO for ProStor, where he brought products to market addressing a long-term archive for Information Technology and the Healthcare and Media/Entertainment markets.
He has also written numerous industry articles and papers as an educator and presenter, and he is the author of two books: Planning a Storage Strategy and Information Archiving – Economics and Compliance. The latter is the first book of its kind to explore information archiving in depth. Randy regularly teaches classes on Information Management technologies in the U.S. and Europe.