Confluent Cloud: Fully Managed Kafka Streaming

Executive Summary
Streaming Data in the Cloud
Ease-of-Use Test Setup
Results
Conclusion
Appendix
Disclaimer
About William McKnight
About Jake Dolezal
About GigaOm
Copyright

1. Executive Summary

This report focuses on real-time data and how autonomous systems can be fed at scale reliably. To shed light on this challenge, we assess the ease of use of a fully managed Kafka platform—Confluent Cloud—and a self-managed open-source Apache Kafka solution.

The most popular tool for streaming data is the Apache Kafka project. Created by LinkedIn, Kafka was open sourced and graduated from the Apache Incubator in late 2012. Kafka is a distributed publish-subscribe messaging system that maintains feeds in groups, known as topics. Publishers write data to topics where subscribers can read them. Kafka is a distributed system, where topics are partitioned and replicated across multiple nodes in the cluster.

Within Kafka, messages are key/value pairs that can store objects in any format. Messages with the same key are ordered and stored in the same partition so they can be consumed by the same instance of a subscriber.

Using story points, we assessed the comparative ease-of-use value realization between Confluent Cloud and Kafka across setup, development, and operations. We found that the value realization of fully managed Confluent Cloud was about three times that of open-source Kafka in setup, nearly double in development, and more than double in operations.

Scalability is a significant component of why fully managed Confluent Cloud is easier to use than open-source Kafka. It’s easy to get started and can grow up to 5 GBps ingress with the click of a button. This requires hours or days of manual effort with open-source Kafka

Our team found that fully managed Confluent Cloud is much easier to use than open-source Kafka. While Confluent Cloud accelerates the setup, development, and operations, the most impressive feature is the seamless scale out for when the application grows.

2. Streaming Data in the Cloud

With companies striving to be data driven and utilize every bit of data possible, it is essential to process an increasing amount of data in real time. There are numerous applications being developed that make autonomous decisions about where data is produced, consumed, analyzed, and reacted to in real time. The technology is making pragmatic, tactical decisions on its own as a result. However, if data is not captured within a specific timeframe, its value is lost, and the decision or action needed never occurs.

There are, fortunately, technologies designed to handle large volumes of time-sensitive streaming data. Known by names like streaming, messaging, live feeds, real-time, and event-driven, this data category needs special attention because delayed processing can negatively affect its value. A sudden price change, a critical threshold met, an anomaly detected, a sensor reading changing rapidly, an outlier in a log file—any of these can be of immense value to a decision-maker or a process, but only if alerted in time to affect the outcome.

There have been explosive developments in this space in the past few years, together with a corresponding growth of commercial vendors that close source a few capabilities that are borderline necessary for enterprise applications. These capabilities include security features for access control, encryption, auditing, connectors maintained for a growing number of applications and data systems, and disaster recovery tools.

Some organizations require a commercial vendor behind every piece of software in the shop. Others decide on a case-by-case basis, often leaning toward open-source options that boast low up-front costs and the opportunity to prove out software, albeit without the safety net of a commercial vendor arrangement.

The realities of the pandemic have exposed the importance of a reliable, agile technology infrastructure like Confluent for enabling business continuity, mining intelligence, and scalable operation. Understanding the cost implications of Confluent deployments in this environment is important, especially as enterprises build and run applications and services on managed platforms.

This report outlines the results of an ease-of-use field test to uncover the advantages of a fully managed Kafka platform—Confluent Cloud—over a self-managed open-source Apache Kafka solution. Note that the open-source Kafka deployment assumes a virtualized or cloud-based environment, as opposed to a bare-metal environment that presents even greater complexity. It is also worth noting that this comparison addresses only a subset of the functionality enabled by Confluent Cloud, which is itself a complete data streaming platform.

3. Ease-of-Use Test Setup

Assessing the ease of use of a particular platform is an important factor in time-to-value implementations and day-to-day operations and maintenance. Often, an ease-of-use study uncovers hidden costs and the time and effort it takes to get the solution off the ground and produce value for the organization. It can also uncover hidden technical debt and issues of platform maintenance that come from custom configurations, undocumented workarounds, siloed development, individual contributor knowledge, and so on.

Two of the value propositions of fully managed platforms in the cloud are increased time to value and the mitigation of technical and administration debts—which we call ease-of-use value realization. This report measures the ease of use and quantifies the difference between the time-and-effort costs of a self-managed, open-source Kafka solution versus the fully managed offering from Confluent Cloud.

We utilized all steps of a use case for a distributed event store and stream-processing platform. The categories, or components of ease of use, that we included in our calculations are as follows:

Setup: Tasks required to get the platform up and running (3).
Development: Common tasks involved in building a streaming data solution.
Operations: Basic everyday tasks in the daily administration of the platform.
Scale: Tasks involved in scaling the platform up and down, including geo-replication.

See the Appendix for a work breakdown of the testing tasks.

Testing Configuration
For our test, we built two different core streaming data stacks—the only difference being the data streaming engine. Table 1 and Figure 1 illustrate the stacks, with Figure 1 depicting the streaming platforms being compared in the middle column.

Table 1. Streaming Data Stacks

Component	Confluent Cloud	Apache Kafka
Source Connector #1	MySQL	MySQL
Source Connector #2	SQL Server	SQL Server
Source Connector #3	MongoDB	MongoDB
Sink Connector	PostgreSQL	PostgreSQL
Stream Processing	ksqlDB	ksqlDB
Source: GigaOm 2022

Figure 1. Streaming Data Stacks

To simulate data movement across these stacks, we used TPC-C-like transactional workloads to generate data in the sources and used Schema Registry and Connect to move the data to the PostgreSQL sink. NOTE: This was not a TPC-C benchmark nor was it a performance benchmark (learn more about the TPC-C workload). The TPC-C-like workload was only used to generate data for stream processing.

Test Scoring
To score ease of use, we took inspiration from an agile Scrum development project-management approach. As experienced consultants, we have extensively used and advocated for the agile project management methodology when developing and operating information management platforms. The agile method is a much bigger subject than the scope of this paper, but we used the following components of an agile methodology in this ease-of-use field test, many of which will be familiar to readers:

Story: A story is a task that must be completed to move to the next stage in development of the project or to complete an operational objective.
Backlog: The agile backlog is a list of stories or tasks.
Work Breakdown: A story can be broken down into discrete steps for completion.
Story Size: A story is sized according to the time and effort required to complete it. Sizing a story appropriately is an art and a science that requires experience.
Story Points: The story size is typically expressed as a numeric value to quantify story work and completion regardless of the sizing method used.
Sprint: A sprint is usually an interval of time where a group of stories is chosen and worked on. In this study, however, we omitted the time interval and just defined a sprint as all the stories required to complete the development of the component or system/operational change.
Burndown: A burndown is a graphical chart that shows the completion of stories over the period of a sprint.

In our study, we collected two different measures:

Task Step Count: The result of the work breakdown, i.e., the number of discrete steps needed to complete the task/story.
Task Size: The time and effort required to complete the story. In routine agile projects, this is done ahead of time so that tasks can be planned and sequenced before work begins. However, we did it after the completion of the task to be more accurate.
To quantify size, we used both the T-shirt size method and the Fibonacci sequence. We went from extra-small (XS) to extra-extra-large (XXL). Figure 2 shows each size and the Fibonacci number assigned to it.

Figure 2. T-Shirt Sizes

We documented every story/task in the completion of our sprints, the URL we used, the number of discrete steps required, and how much time and effort it required—expressed as a T-shirt size and translated to a Fibonacci number. We divided the work into the following six (6) sprints:

Setup Sprints:
- Set up Kafka Cluster Sprint
- Set up ksqlDB Sprint
- Set up Connect Sprint
Typical Development Sprint
Basic Operational Change Sprint
Scale Out Sprint

4. Results

This section details the results of our ease-of-use field test. To present our findings, we begin with a series of burndown charts showing the time and effort required to complete each sprint.

Burndown Charts

Setup Kafka Cluster Sprint
The burndown chart in Figure 3 compares the burndown of work required to set up a Kafka Cluster between Apache Kafka and Confluent Cloud. As you can see, there was considerably less time and effort required in Confluent Cloud (with only 7 story points) compared to Apache Kafka (53 story points), since many of the tasks are taken care of using their fully managed platform out of the box.

Figure 3. Kafka Cluster Setup Burndown

Setup Database Sprint
The chart in Figure 4 compares the burndown of work required to set up the ksqlDB database on both Apache Kafka and Confluent Cloud. Again, there was much less time and effort required in Confluent Cloud (with only 9 story points) compared to Apache Kafka (38 story points).

Figure 4. ksqlDB Setup Work Burndown Chart

Setup Database Connect Sprint
Figure 5 compares the burndown of work required to set up the four database connectors on both Apache Kafka and Confluent Cloud. Keep in mind, this included many tasks for setup and configuration of each of the four databases (PostgreSQL, MySQL, SQL Server, and MongoDB). There was less time and effort required in Confluent Cloud (29 story points) compared to Apache Kafka (42 story points).

Figure 5. Connect Setup Work Burndown Chart

Note that the connectors used in this sprint are ones that are freely available for both Confluent Cloud and open-source Apache Kafka. However, Confluent offers a slew of additional fully managed connectors that can enable even faster time to value when integrating with external systems.

Typical Development Sprint
The chart in Figure 6 compares a typical development sprint for tasks involving both Apache Kafka and Confluent Cloud. Keep in mind that during a Kafka rollout, these same development tasks will be repeated over and over, multiplying the advantage of saving time and effort by using Confluent Cloud. There was less time and effort required in Confluent Cloud (12 story points) compared to Apache Kafka (22 story points).

Figure 6. Development Work Burndown Chart

Basic Operational Change Sprint
The chart in Figure 7 compares performing basic operational tasks on both Apache Kafka and Confluent Cloud, and includes tasks like modifying a topic, rebalancing a topic leader, removing a topic, and throttling bandwidth. Again, these tasks will be repeated over and over, multiplying the effect over time. There was less time and effort required in Confluent Cloud (3 story points) compared to Apache Kafka (7 story points).

Figure 7. Operations Work Burndown Chart

Scale Out Sprint
Figure 8 compares performing a scale out (adding nodes) on an Apache Kafka cluster. The major difference here is that Confluent Cloud completely manages scaling operations and therefore requires zero time or interaction to manage the size of the infrastructure. By contrast, Apache Kafka requires considerable time and effort (47 story points).

Figure 8. Scale-Out Work Burndown Chart

Summary

To summarize and measure overall ease of use, we used the results of our agile work burndown to quantify a final result. We arrived at two factors:

Effort Factor: This shows the relative time and effort savings an IT organization can expect when deploying a Kafka solution on Confluent Cloud compared to employing self-managed open source. (Figure 9)
Complexity Factor: This measures how complex individual tasks are on the two platforms. (Figure 10) It was derived by taking a ratio of task size to the number of discrete task steps needed to complete the tasks. A higher number means tasks are typically more complicated, require more expertise, and/or are prone to more human errors.

Figure 9. Effort Factor (Lower is Better)

In terms of the overall effort, we found Confluent Cloud should take about 71% less time and effort to deploy, develop, and maintain than a self-managed open-source Apache Kafka stack. To get a sense of the scope of tasks assessed in this comparison, the detailed work breakdown is published in the Appendix of this report.

Figure 10. Complexity Factor

We found that overall, working with Confluent Cloud is roughly 31% easier (less complex and less prone to human error) than open-source Apache Kafka.

5. Conclusion

In our test, we utilized all steps of a use case for a distributed event store and stream-processing platform. The categories, or components of ease of use, included in our calculations were across setup, development, operations, and scale categories. We broke each of these categories into their component tasks for each platform, and then assigned scoring to these tasks to determine relative level of effort and difficulty associated with each process category.

Our findings: In setup operations, a fully managed Confluent Cloud offers about three times the ease-of-use value realization compared to open-source Apache Kafka. In terms of development, the advantage of Confluent Cloud over Apache Kafka is almost 2x, while for operations the advantage is greater than 2x.

Scalability is an area of significant advantage. We found a fully managed Confluent Cloud much easier to use than open-source Apache Kafka—a click of a button is all that is needed to grow a deployment up to 5 GBps ingress. By comparison, open-source Apache Kafka can require hours or even days of manual effort to do the same–and that doesn’t even account for the additional planning and coordination that must take place ahead of a scaling effort.

In terms of overall effort, we found Confluent Cloud should take about 71% less time and effort to deploy, develop, and maintain than a self-managed open-source Apache Kafka stack. We also found that overall, working with Confluent Cloud is roughly 31% easier (less complex and less prone to human error) than open-source Apache Kafka.

Any way you slice it, fully managed Confluent Cloud proved in our testing to be much easier to use than open-source Kafka. Using Confluent Cloud accelerates setup, development, and operations, but the most impressive feature is the seamless scale out for when the application grows (not to mention the ability to scale down to realign spending when workloads shrink). This drives straight to the bottom line through efficiencies, the time to do more, and the ability to realize more business value.

6. Appendix

The following is a detailed work breakdown used for the field tests. The platform for which the task applies is also indicated.

Setup

Kafka

Configure

Change cluster settings: Confluent Cloud
Configure broker initially: Apache Kafka
Create CI/CD pipeline process for prod config updates: Apache Kafka
Reconfigure and deploy broker config: Apache Kafka
Repeat for production environment: Apache Kafka
Test broker config: Apache Kafka

Install

Create a cluster: Confluent Cloud
Download latest release: Apache Kafka
Install Java: Apache Kafka
Launch EC2 instance: Apache Kafka
Repeat for each broker: Apache Kafka
Repeat for production environment: Apache Kafka
Start environment: Apache Kafka

Security

Configure access management: Confluent Cloud
Configure ACLs: Apache Kafka
Configure SASL for brokers: Apache Kafka
Configure Zookeeper authentication: Apache Kafka
Create certificate authority ad sign: Apache Kafka
Generate SSL key/cert for each broker: Apache Kafka, Confluent Cloud
Reconfigure brokers for cert: Apache Kafka

ksql

Configure

Configure ksql listeners: Apache Kafka
Configure ksql server parameters: Apache Kafka
Configure topic and connector: Confluent Cloud
Create CI/CD pipeline process for prod config updates: Apache Kafka
Repeat for production environment: Apache Kafka
Test listener config: Apache Kafka

Install

Add public key and repo: Apache Kafka
Create ksql cluster: Confluent Cloud
Install pre-reqs: Apache Kafka
Launch EC2 instance: Apache Kafka
Repeat for additional cluster nodes: Apache Kafka
Repeat for production environment: Apache Kafka
Start environment: Apache Kafka

Security

Setup RBAC for ksql: Apache Kafka, Confluent Cloud

Connect

Configure

Configure connector for PostgreSQL: Confluent Cloud
Configure SQL Server: Apache Kafka
Configure CDC for SQL Server: Apache Kafka
Configure connector for MongoDB: Confluent Cloud
Configure connector for MySQL: Confluent Cloud
Configure connector for SQL Server: Confluent Cloud
Configure data change events for MongoDB: Apache Kafka
Configure data change events for MySQL: Apache Kafka
Configure data change events for PostgreSQL: Apache Kafka
Configure Debezium connector for MongoDB: Apache Kafka
Configure Debezium connector for MySQL: Apache Kafka
Configure Debezium connector for PostgreSQL: Apache Kafka
Configure Debezium connector for SQL Server: Apache Kafka
Configure MongoDB: Apache Kafka, Confluent Cloud
Configure MySQL: Apache Kafka, Confluent Cloud
Configure PostgreSQL: Apache Kafka, Confluent Cloud
Configure SQL Server: Confluent Cloud

Install

Create connector for MongoDB: Confluent Cloud
Create connector for MySQL: Confluent Cloud
Create connector for PostgreSQL: Confluent Cloud
Create connector for SQL Server: Confluent Cloud
Install Debezium connector for MongoDB: Apache Kafka
Install Debezium connector for MySQL: Apache Kafka
Install Debezium connector for PostgreSQL: Apache Kafka
Install Debezium connector for SQL Server: Apache Kafka

Security

Configure MongoDB security: Apache Kafka, Confluent Cloud
Configure MySQL security: Apache Kafka, Confluent Cloud
Configure PostgreSQL security: Apache Kafka, Confluent Cloud
Configure SQL Server security: Apache Kafka, Confluent Cloud

Schema Registry

Configure

Configure schema registry: Apache Kafka, Confluent Cloud

Install

Install schema registry: Apache Kafka, Confluent Cloud

Operations

Kafka

Basic Ops

Create a topic: Apache Kafka, Confluent Cloud
Modify a topic: Apache Kafka, Confluent Cloud
Rebalance topic leader: Apache Kafka
Remove a topic: Apache Kafka, Confluent Cloud
Throttle rebalance/reassignment bandwidth: Apache Kafka

Scale

Kafka

Add Brokers

Download latest release: Apache Kafka
Install Java: Apache Kafka
Launch EC2 instance: Apache Kafka
Migrate existing topics to new broker: Apache Kafka
Start environment: Apache Kafka

Remove Brokers

Create a topic reassignment plan: Apache Kafka
Gracefully shutdown broker: Apache Kafka
Perform topic reassignment: Apache Kafka
Snapshot instance: Apache Kafka
Terminate instance: Apache Kafka

Replication

Increase replication factor: Apache Kafka
Perform topic reassignment: Apache Kafka

Geo-Replication

Configure replication flows: Apache Kafka
Create MirrorMaker configuration file: Apache Kafka
Create replication flows: Apache Kafka
Secure replication flows settings: Apache Kafka
Start geo-replication: Apache Kafka

Development

Kafka

Schema Registry

Configure consumer: Apache Kafka, Confluent Cloud
Configure producer: Apache Kafka, Confluent Cloud
Define schema: Apache Kafka, Confluent Cloud
Modify schema with compatibility check: Confluent Cloud
Modify schema with manual compatibility check: Apache Kafka

ksql

Event Stream

Create a stream: Apache Kafka, Confluent Cloud

Materialized View

Create a materialized view: Apache Kafka, Confluent Cloud

Transformation

Create a transformation: Apache Kafka, Confluent Cloud

7. Disclaimer

Ease-of-use is important, but it is only one criterion for platform selection. This test is a point-in-time check into specific time-and-effort tasks. There are numerous other factors to consider in selection across factors of Performance, Administration, Features and Functionality, Workload Management, User Interface, Scalability, Vendor, Reliability, and numerous other criteria. It is also our experience that documentation and practices change over time and is different for different platforms.

GigaOm runs all of its tests to strict ethical standards. The results of the report are the objective results of the application of tests to the simulations described in the report. The report clearly defines the selected criteria and process used to establish the field test. The report also clearly states the tools and workloads used. The reader is left to determine for themselves how to qualify the information for their individual needs. The report does not make any claim regarding the third-party certification and presents the objective results received from the application of the process to the criteria as described in the report. The report strictly measures ease-of-use and does not purport to evaluate other factors that potential customers may find relevant when making a purchase decision.

This is a sponsored report. Confluent chose the competitors. GigaOm designed the test and scoring rubric. Choosing compatible configurations is subject to judgment. We have attempted to describe our decisions in this paper.

8. About William McKnight

William McKnight is a former Fortune 50 technology executive and database engineer. An Ernst & Young Entrepreneur of the Year finalist and frequent best practices judge, he helps enterprise clients with action plans, architectures, strategies, and technology tools to manage information.

Currently, William is an analyst for GigaOm Research who takes corporate information and turns it into a bottom-line-enhancing asset. He has worked with Dong Energy, France Telecom, Pfizer, Samba Bank, ScotiaBank, Teva Pharmaceuticals, and Verizon, among many others. William focuses on delivering business value and solving business problems utilizing proven approaches in information management.

9. About Jake Dolezal

Jake Dolezal is a contributing analyst at GigaOm. He has two decades of experience in the information management field, with expertise in analytics, data warehousing, master data management, data governance, business intelligence, statistics, data modeling and integration, and visualization. Jake has solved technical problems across a broad range of industries, including healthcare, education, government, manufacturing, engineering, hospitality, and restaurants. He has a doctorate in information management from Syracuse University.

10. About GigaOm

GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.

GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.

GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.

Table of Contents

1. Executive Summary

2. Streaming Data in the Cloud

3. Ease-of-Use Test Setup

4. Results

Burndown Charts

Summary

5. Conclusion

6. Appendix

Setup

Operations

Scale

Development

7. Disclaimer

8. About William McKnight

9. About Jake Dolezal

10. About GigaOm

11. Copyright

Related Research