Clicky

IBM Lakehouse as Part of watsonx.data for AI and Analytics
by Randy Kerns | May 10, 2023

The News: IBM announced a new Lakehouse as part of watsonx.data at their Think conference. Read more here.

IBM Lakehouse as Part of watsonx.data for AI and Analytics

Analyst Take: IBM has announced the IBM Lakehouse as an evolution beyond the first generation Lakehouses in use today.

For those not familiar, a lakehouse is a mashup of data lake and data warehouse and is used as an analytics repository. More than a database and more than a data lake of unstructured file data, a lakehouse brings a schema to data for fast selective access to potentially massive amounts of data. Lakehouses are primarily used by query engines such as Presto and Apache Spark. Obviously, a lakehouse is serving a huge and growing market for analytics, used for Artificial Intelligence (AI) and Machine Learning (ML). Adoption of AI and ML are driving the need for more and more data with faster access.

The big news regarding specifics for the IBM Lakehouse include:

  • Can be deployed in less than 10 minutes
  • Will work in public cloud or on-premises
  • Offered also with an on-premises integrated appliance – the IBM Storage Fusion HCI
  • OpenShift on bare metal
  • Standard x86 servers
  • Nvidia GPUs
  • IBM Storage

The IBM Lakehouse supports the Apache Iceberg format for query engines. This enables:

  • SQL access for large data sets
  • Multiple simultaneous query engine access
  • ACID transactions support (Atomicity, Consistency, Isolation, Durable)

For performance improvement, a global persistent cache is implemented – every query engine can access the same cache across hybrid clouds or on-premises.
Storage includes object storage with S3 protocol, IBM Cloud, and Google Cloud Storage.

Lakehouses have a big future and IBM has pushed forward with their announcement with key capabilities for customers. They have recognized that on-premises is important as well as public cloud with the customer concerns on data privacy and storage costs. They also understand how complex a container environment can be and the packaged IBM Storage Fusion HCI will be a rapid deployment solution. IBM is a major player in data for analytics – AI and ML and IBM Lakehouse will enhance the completeness of their offerings.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

IBM and Stony Brook University Selected to Anchor the New York Climate Exchange by the City of New York

AI-powered Bing Now in Open Preview as Platform Continues to Grow and Evolve

AI is on Fire—New Broadcom AI Fabric Aims to Stoke the Flames

About the Author

Randy Kerns

Randy Kerns is a key strategist at Futurum Research, formerly Evaluator Group, and his focus is on identifying and exploring major trends and shifts that occur within the IT Data Center and information technology market space. With over 35 years of experience in helping storage companies design and develop products, Randy spends much of his time advising IT end-user clients on architectures and acquisitions.