Table of Contents
- Summary
- About the Key Criteria Report
- Data Warehouse Primer
- Key Criteria
- Evaluation Guidance
- About Andrew Brust
- About GigaOm
- Copyright
1. Summary
Data warehouses have long been the enterprise’s trusted technology for large scale data storage and analytics. The platforms have become even more relevant in the last few years as most traditional vendors have modernized their offerings. Some of this evolution includes advanced scaling capabilities, massive parallelism, enhanced ease-of-use, reduced total cost of ownership, and evolving features and architecture that allow users to better take advantage of the cloud’s native capabilities.
Additional enhancements to the platforms have been implemented widely as we have moved from a core data warehouse offering to a more integrated platform with warehouse capabilities at its core. These include integrations with formerly separate technologies, such as data lakes, Spark and Hadoop, as well as autonomous operations, tight integration with business intelligence (BI) tools, integrations with data engineering, data science and machine learning (ML) workflows, and built-in data governance, data quality and data prep capabilities.
A data warehouse platform is now a critical component of an organization’s path to managing the full data lifecycle. It should be evaluated on both technical and non-technical capabilities holistically.
This GigaOm Key Criteria report identifies key criteria and evaluation metrics for selecting a platform. This report will give you an overview of the technologies that are critical to understand as you make informed decisions about the platforms in which you plan to invest.
Key Findings
- Data warehousing is a mature, well-understood technology category, widely-used in organizations of all sizes to enable massive data sets to be stored and analyzed.
- The cloud is where most new platform development is now happening. Several new cloud-native platforms have emerged, and traditional vendors have created new offerings to modernize their platforms or enable transitions from on-premises to the cloud. Hybrid cloud capabilities are also receiving investment, allowing data sets spanning on-premises and the cloud to be processed and analyzed, as well as enabling enhanced disaster recovery capabilities.
- Integration of data science, ML, and artificial intelligence (AI) capabilities with the base data warehousing architecture is being widely pursued by most vendors. Different approaches and levels of integration are currently being attempted, with further enhancements on the horizon.
- As scaling technologies mature, even more massive amounts of data are now able to be economically and quickly stored and analyzed.
- SQL continues to be the dominant querying language, oftentimes with vendor-specific additions or variations to the standard..
- Integration with data lakes, query federation, and processing of remote data sets in place, is becoming more prevalent, with several vendors implementing performance-enhancing features.