Listen to this article now
State of the Ecosystem – Data Architecture Clarity and Selection is Challenging Today
Organizations across the data ecosystem are struggling with identifying the data architecture combination that is best suited for meeting their intricate data demands. Today’s data teams are tasked with the immense challenge of delivering and administering all their organization’s data and workloads throughout the entirety of their on-premise and cloud environments while also assuring minimal to no latency. In essence, they are focused on advancing the main business objective of making their business a data-driven organization by delivering everything, everywhere all at once across their evolving data architecture.
As a result, data decision makers are evaluating data fabric, data lakehouse, and data mesh trends to keep up with organization-wide data demands. We believe that supplying definitions of these data architectures can provide better understanding of these options and why decision makers are contemplating them in fulfilling the goal of data architecture optimization.
Data Mesh: An approach used to help scale a company’s data footprint in a manageable way through the decentralization of data and workloads. Data mesh is a set of practices around people, process, and technology choices that allow for companies to elastically scale their data systems. Key data mesh design principles as including self-serve data discovery, full data security, data lineage, data auditing, and data cataloging. We find large organizations with a domain-tailored architecture benefit the most from adoption since data meshes preserve the data and its ownership in the domain where it originated, thereby avoiding IT chokepoints, and assuring domain-based scaling.
Data Fabric: For instance, only with data properly understood through a fabric, can a mesh sensibly divide into domains and know what data is at its disposal. Fundamentally, concepts in data mesh map to real-world artifacts in the data fabric implementations. One way to implement a data mesh is to make technology choices within the framework of a data fabric. As such, data fabric is a collection of technologies used to ingest, store, process, and govern data anywhere at any time. Data fabric can be deemed as the technology part of data mesh. We see data fabric adoption picking up across organizations that look to accelerate integration between their data silos, make data readily available to business users regardless of location, and advance fulfillment of their data compliance and security goals.
Data Lakehouse: Data lakehouses integrate and unify the capabilities of data warehouses and data lakes with the goal of supporting artificial intelligence (AI), machine learning (ML), business intelligence, and data engineering on a unified platform. Specifically, open data lakehouses help organizations run rapid analytics on all data — both structured and unstructured — at massive scale. Today we see organizations swiftly embracing open data lakehouses to attain interoperability across different analytic engines and vendors, leveraging community-driven innovation to avoid vendor lock-in, and solving their real-world business problems in pragmatic ways with best-of-breed capabilities.
For additional clarification, we view hybrid architectures as the technology decisions made to ingest, store, process, govern, and visualize data in different form factors, encompassing on premises and multiple clouds, also replicating data according to need. As such, hybrid architectures can be viewed as an implementation of a data fabric that spans multiple form factors.
We find there is a wide variance of perspective on what constitutes a hybrid architecture. Although establishing a single official industry-wide definition is unlikely and simply not as important as meeting enterprise demand in using a hybrid architecture to avoid architectural lock-in and the potential constraints imposed by the specific technologies implemented or the location of data production and consumption. Regardless of the hybrid architecture used, we see enterprises giving top priority to having hybrid architecture flexibility and choice, especially toward improving their business outcomes.
We see data decision makers grappling with a great deal of marketing noise advocating the superiority of one of these data trends, making their decision to adopt only one of these trends or a combination of the trends more vexing. Overall, we do not believe the data trend selection process is an either/or choice and that data decision makers can optimize and modernize their data architecture by using an open-source data platform that brings built-in versatility and flexibility.
Data Architecture Trends: What to Expect
We see key data trends emerging that are shaping and driving the data architecture optimization process. For instance, data contracts are emerging as a new approach to data mesh as they can provide transparency over data usage and dependencies. In the near-term, we anticipate that decision makers will proceed cautiously by initially focusing on standardization support and technical stability. In this nascent stage, data governance is integral although avoiding excessive overhead merits extra scrutiny. As more confidence in data contracts is gained, we expect organizations to automate more of their data mesh processes including data mesh contracting.
Key to the enduring success of data meshes is assuring that the metadata, both dynamic and static, is consistent across all data products. This entails that the data model of the metadata must be consistent regardless of the underpinning technologies used. This data model functions as the contract structure which is defined between the producers and consumers of the data. In sum, consumers gain more flexibility to subscribe to data products that are generated by the data producers.
From our viewpoint, data decision makers are also investigating combining the data mesh with the data exchanges being built such as the Snowflake data exchange, Amazon data exchange, and others. This trend could further enlarge how data meshes are defined and understood. However, the future of this approach is currently unsettled as the data exchanges are designated primarily as producer and consumer marketplaces that usually do not have an analytics workload associated with them.
Cloudera: Meeting the Challenges and Easing the Selection of the Best Data Architecture
We believe that Cloudera’s portfolio is well suited to meet the demands of today’s rapidly evolving data architectures. This especially applies to being the trusted partner for the data decision makers who are making the selection of the data trends, including very likely their combinations, that are best suited to optimizing their data architecture journey.
The Cloud Data Platform (CDP) enables modern data architectures on a data anywhere and anytime basis, all according to the customer’s scale requirements. By supporting all the major data models in play today — i.e., data mesh, data fabric, and data lakehouse — Cloudera assures customers can avoid lock-in into one trend and have the flexibility vital to optimizing their data architecture through data trend selectivity.
For example, the integrated security and governance capabilities available through Cloudera’s Shared Data Experience (SDX) already have a proven track record in the delivery of successful data meshes across tightly regulated industries such as financial services. Additionally, the versatility of the Cloudera Data-in-Motion product and broader integration of CDP enable intricate use cases that extend beyond the data mesh in areas such as the ingestion and processing of IoT data for customer analytics and real-time cybersecurity analytics. This gives customers the overall data architecture flexibility key to optimizing their data model combinations.
We are also encouraged by Cloudera’s extensive support for open data lakehouse use cases over the last several years. Through open-source support, Cloudera customers can gain the confidence to advance their data trends selections with the knowledge that any choice they make maintains architectural flexibility and avoids lock-in. These deployments use open-source engines on open data and table formats, allowing for easy use of data engineering, data science, data warehousing, and machine learning in the data architecture optimization process.
From our perspective, Cloudera’s hybrid data platform provides the building blocks key to demystifying and deploying all modern data architectures. While technology in and of itself is insufficient to deploy any architecture, we believe there is tremendous benefit in having a single platform that meets the requirements of all architectures. Organizations can streamline their data trend selection process by minimizing the workforce training required to use, manage, and administer multiple systems. In addition, a single platform eliminates the need to replicate key capabilities such as governance across multiple trends throughout different locations and infrastructures.
Ultimately, we believe that Cloudera can provide the technological component of the solution to support any organization’s data-driven initiative by implementing the data mesh, data fabric, and data lakehouse trends according to customer selection and prioritization.
Disclosure: Futurum Research is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum Research as a whole.
Other insights from Futurum Research:
The Six Five On the Road with Rob Bearden, Cloudera CEO
Cloudera Infuses Value Across Data Ecosystem with Innovative Open Data Lakehouse Approach
Understanding and Embracing the Hybrid Multi-Cloud