Table of Contents
- Summary
- Data Lake and Lakehouse Primer
- Report Methodology
- Decision Criteria Analysis
- Evaluation Metrics
- Key Criteria: Impact Analysis
- Analyst’s Take
- Methodology
- About Andrew Brust
- About GigaOm
- Copyright
1. Summary
Organizations need to manage data on a large scale that is stored in different formats—structured, unstructured, or semi-structured—without having to rely on proprietary software, as with data warehouses. Data lakes allow organizations to easily, and with very little maintenance or structure, store and query large amounts of data.
As a result, many data lakes are compatible with many different types of file formats, including CSV (comma-separated values), Parquet, and newer formats like Delta Lake and Iceberg. Additionally, many data lakes (and the query engines built to analyze the large-scale datasets within them) leverage an underlying open source technology, support open file formats, and handle security and governance through integration with additional open source technologies, such as Apache Ranger and Atlas.
The past, present, and future of data lakes are intertwined with those of the data warehouse. Both solutions originated with attempts to find a single optimal solution to enterprise data management. Additionally, over the past year, the term “lakehouse” has moved from a novel, somewhat esoteric moniker into the mainstream. A lakehouse is a solution that attempts to blend capabilities of data warehouses and data lakes together. The blending is done by implementing query engine features that are designed to bring the optimizations and performance of a data warehouse to a data lake. Proponents of this architecture describe a lakehouse as an optimal blend of data lake and data warehouse approaches.
Today, there is a wide range of opinions, philosophies, and marketing biases within the industry regarding the relationship between data lakes and data warehouses. Some vendors are proponents of a data-warehouse-only approach. Others provide users with the option of either a data lake or a data warehouse within the same product offering. Still others promote their lakehouse offerings as a best-of-both-worlds approach.
Lastly, the concept of data mesh has also emerged, and it describes a distributed domain-oriented architecture in a move away from the centralized data solution. Proponents of data mesh claim its advantages revolve around putting control of data management throughout the data lifecycle in the hands of the individual teams that own the data.
Regardless of the specific technology label or bias, the most important thing for organizations to focus on when selecting a product is the use case it must address. To that end, this report aims to assist organizations in their decision-making process to help them make informed investment decisions about the solution that best suits their needs. First, we include a primer to provide a background of the technologies involved in data lakes and query engines. Then we walk through a list of the capabilities (table stakes, key criteria, and emerging technology) and evaluation metrics (non-functional purchase drivers) for selecting a data lake or lakehouse solution.
How to Read this Report
This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:
Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.
GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.
Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.