Table of Contents
- Summary
- Definition & Categories
- Key Criteria Definitions
- Vendor Review
- Key Takeaways
- Methodology
- About Andrew Brust
- About GigaOm
- Copyright
1. Summary
Streaming data platforms comprise a relatively new category in the data world; one which has seen explosive growth in the last decade. The proliferation of massive data producing applications — such as web applications, social media, Internet of Things (IoT) devices, and others — has created massive amounts of continuous data that has overwhelmed traditional storage and processing solutions. A variety of streaming platforms have emerged to address these needs better than traditional database and message queueing systems. The next generation of streaming data platforms has succeeded in handling real-time data at the petabyte scale, with increasing ease-of-use.
In this report, we explore the major streaming data platforms, outline the key criteria by which a streaming platform should be evaluated, and award marks to each platform based on how it covers those key criteria and evaluation metrics. Below we present our awards given to data streaming platforms we examined.
Key findings:
- The sector is undergoing rapid development and expansion. There are no clear commercial winners, although some clear trends on the underlying technology have emerged.
- The majority of vendor offerings are based on open-source technology, typically from the Apache Software Foundation. Apache platforms such as Kafka, Beam, Pulsar, and others underpin the majority of commercial offerings. In particular, Kafka surfaces in some way or another in most of the offerings, typically as an underlying technology or as a data source/destination.
- General levels of abstraction of underlying technology and ease-of-use still need to improve for streaming data technology to see wider adoption among mainstream technologists.
- All vendors offer connectivity to on-premises and cloud-based data sources. Enterprise software companies’ and cloud providers’ streaming products and services offer deep integration with their own broader platforms.
- Perhaps due to the open-source pedigree, or the cloud native-capabilities of the offerings, most vendor pricing is relatively transparent. There are still some heavy touch sales tactics from some of the traditional vendors, but most offerings offer clear and sometimes self-service pricing.
- Given the size of the data streaming platform market, we have grouped the offerings together in four distinct categories:
- Apache Kafka-based services
- General-purpose cloud-native services
- Start-up platforms
- Other streaming platforms
- Implementation of the vendors’ offerings is typically a complex affair for on-premises solutions. While open source technology has been a boon for innovation, it also requires significant expertise to set up, manage, and operationalize in order to achieve acceptable performance. On the other hand, cloud-native managed offerings and management tooling has sprung up from many vendors to make data streaming easier to set up and operate.
- Our platform evaluation criteria orbit around these five axes:
- Platform maturity
- Hybrid cloud capabilities
- Visual data pipeline authoring
- Edge data processing capabilities
- Management interface features
- We also outline a set of common evaluation metrics
- On-premises availability
- Cloud-native capabilities
- Managed services offerings
- Analytics offerings
- Connectivity & Development
- Most platforms offer a variety of capabilities. We have found that the major areas on offer include:
- Real-time data latency
- Streaming analytics
- Connectivity to and from on-premises and cloud data sources
- Cloud-native capabilities
- Managed services
- Compatibility with Apache Kafka
- Visual, low-code or no-code streaming data pipeline construction
- Auto-scaling, healing, replication, and management tools
- Edge processing for localized operations on data at or near the device generating events