Jake Dolezal, Author at Gigaom

High-Volume Data Replication

Mon, 04 Mar 2024 16:00:18 +0000

This report was commissioned by Fivetran.

Whether for operational or analytical purposes – databases are the backbone of how many businesses run; from collecting consumer behavior on your website to processing IOT data across your supply chain and so much more. Accessing and replicating massive volumes of database content is key to business success and the responsibility of managing this crucial element of your infrastructure falls to data leaders and their teams.

Ensuring your solution for database replication can keep up with your business is a pressing need for every data leader across every industry and company size. In this report, we investigate two major vendors in database replication and put them to the test in terms of speed and cost.

Behind the Scenes: How it Works
The process of locating and recording modifications to data in a database and instantly sending those updates to a system or process downstream is known as data replication or change data capture (CDC).

Data is extracted from a source, optionally transformed, and then loaded into a target repository—such as a data lake or data warehouse. Ensuring that all transactions in a source database are recorded and instantly transferred to a target keeps the systems synchronized and facilitates movement of data between on-premises sources and the cloud with minimal to no downtime for dependable data replication.

CDC–an incredibly effective method for moving data across technologies–is essential to modern cloud architectures. The real-time data transfer accelerates analytics and data science use cases. Enterprise data architectures utilize CDC to efficiently power continuous data transport between systems. Log-based CDC is a CDC method that uses a database’s transaction log to capture changes and replicate them downstream.

Using competing technologies Fivetran HVR and Qlik Replicate, our scenario assessed the total cost of ownership (TCO) of syncing 50 GB to 200 GB per hour of change data between a source Oracle database and a target Snowflake data warehouse using log-based CDC on the source. Notably, we assessed TCO based on configurations that reflect the performance requirements of enterprise customers. For this assessment, data replication latency needed to stay below five minutes to meet the requirements of data replication customers, regardless of redo log change rate. These tests simulate scenarios commonly encountered by large enterprises when utilizing technologies for log-based CDC.

At 200 GB/hour Fivetran HVR

proved 25% less costly than Qlik Replicate.

In this study, we sought to compare the total cost of ownership between Fivetran HVR and Qlik Replicate, based on similar levels of operational latency.

In our performance testing, as the volume of redo log change data increased, Fivetran HVR produced a flat linear trend in replication latency while Qlik Replicate latency steadily increased. At 50 GB/hour, tested latencies for both platforms were safely below five minutes; but at 100 and 200 GB/hour of change data, the single Qlik instance produced unacceptably high latencies (as much as 27 times greater than those produced by Fivetran HVR).
To produce a valid TCO comparison, we factored in the cost to scale the Qlik buildout with additional instances. As redo log change data doubled from 50 to 100 GB/hour, a second Qlik instance was accounted for, and at 200 GB/hour the instance count was doubled again to a total of four.
Based on these findings, TCO calculations reveal that Fivetran HVR is 7% less expensive than Qlik Replicate as redo log change data rates increase to 100 GB/hour, and 25% less expensive at 200 GB/hour. At the base 50 GB/hour data volume, Qlik was 5% less expensive to operate than Fivetran.

The post High-Volume Data Replication appeared first on Gigaom.

]]>

Data Fabric Field Test: SAP vs. DIY

Fri, 01 Mar 2024 16:00:48 +0000

Interest in building a data fabric is very high, and for good reason. By enabling real-time, trustworthy data across multiple clouds, you can expedite the migration of analytics to the cloud, guarantee security and governance, and quickly generate business value. A data fabric provides a semantic layer that enables a map of your metadata, unifying data based on its meaning, rather than just its location, and preserving the business context and logic of that data across the fabric.

The method for building a data fabric has been assumed to be a messy do-it-yourself (DIY) blend of a data warehouse, a data lake, data integration tools, and data governance tools. Advancements like the semantic layer act as a unified translator, seamlessly connecting your data ecosystem for efficient exploration and analysis and establishing a holistic data environment that empowers users and streamlines decision-making. Acting as independent translators for each data source, the semantic layer topology eliminates messy integrations and provides unified understanding across the entire data ecosystem. Advancements like these yield more streamlined and integrated solutions that simplify the process.

In short, a business data fabric replaces siloed data sources with a unified ecosystem. It transforms data from a cost center to a strategic asset that fuels accessibility, efficiency, and innovation for informed decision-making and competitive advantage.

Key to this capability is the implementation of a metadata map, which captures the location and context of data across the data fabric, providing insights into the characteristics, relationships, and attributes of the data. A comprehensive metadata map empowers you to unlock the true potential of your data, enabling deep insight and strategic advantage.

In this GigaOm Benchmark report, we introduce SAP Datasphere. We compare the task of creating, integrating, distributing, and managing a data fabric with a common DIY set to SAP. Looking across all organization sizes, we found that a DIY data fabric deployment over three years cost 2.4x more than SAP Datasphere. These sharp cost advantages played out across all aspects of adoption—data fabric infrastructure, initial migration/build, CI/CD, and administration.

Figure 1 breaks out SAP Datasphere’s cost advantage across these aspects. It shows that organizations can slash three-year TCO spending with a business data fabric powered by SAP Datasphere, producing savings of up to 138%”.

Figure 1. Comparing Relative Cost of DIY over SAP Datasphere Across Operational Concerns

The post Data Fabric Field Test: SAP vs. DIY appeared first on Gigaom.

]]>

Vector Databases Compared

Fri, 15 Dec 2023 14:00:12 +0000

This report was commissioned by DataStax.

Determining the optimal vector solution from the myriad of vector storage and search alternatives that have surfaced is a critical decision with high leverage for an organization. Vectors and AI will be used to build the next generation of intelligent applications for the enterprise and software industry but the most effective option will often also exhibit the highest level of performance.

The benchmark aims to demonstrate the performance of DataStax Astra DB Serverless (Vector) compared to the Pinecone vector database within the burgeoning vector search/database sector. This report contains comprehensive detail on our benchmark and an analysis of the results.

We tested the critical dimensions of vector database performance—throughput, latency, F1 recall/relevance, and TCO. Among the findings that Astra DB produced versus Pinecone:

55% to 80% lower TCO
Up to 6x faster indexing of data
Up to 9x faster ingestion and indexing of data

We tested throughput, which involved generating vectors and labels, inserting them into databases, and executing queries to measure performance. The queries were of various types, such as nearest neighbor, range, KNN classification, KNN regression, and vector clustering. We also performed latency testing, which measured the response time for each query. Finally, F1 recall/relevance testing measured the database’s performance in returning relevant results for a given query.

The study found that Astra DB significantly outperformed Pinecone p2x8 in ingesting and indexing data, performing six times faster and with a low relative variance, making the ingest workflow more predictable. Astra DB also showed faster response times during active indexing and produced recall ranging from 90% to 94% versus Pinecone’s recall in the range of 75% to 87%. For larger datasets, Astra DB showed better recall and low variance, demonstrating accuracy and consistency. Finally, our testing and configuration pricing revealed that Pinecone’s TCO was 2.2 to 4.9 times greater than Astra DB, making it significantly more expensive to operate.

The results of these tests indicate that DataStax Astra DB Serverless (Vector) is a great choice for storing and searching vectors efficiently.

The post Vector Databases Compared appeared first on Gigaom.

]]>

Digital Analytics and Measurement Tools Evaluation

Mon, 21 Aug 2023 12:00:03 +0000

Digital analytics covers a growing set of use cases as more and more of our lives are mediated by digital platforms. This includes:

Marketing Analytics: Measuring the return on marketing spend, especially from digital channels.
Product Analytics: Helping product teams understand the impact of their product developments on revenue, customer lifetime value, conversion rates, and retention rates and churn rates.
Merchandising Analytics: Relevant for retailers that want to optimize their online offer by understanding the performance of different stock keeping units (SKU).

As digital analytics has become more sophisticated, there has been a move to performing more analytics in the warehouse. Google has supported and driven this trend with the native BigQuery integration with GA4. Snowplow has done something similar. It is a warehouse-first analytics tool, delivering all the data into the data warehouse (i.e., BigQuery, Snowflake, Databricks, Redshift) in near real time.

With these differences in mind, we performed a field test to assess how Snowplow compares to GA4 for a retail organization that realizes the mandate for digital analytics. To make the test as fair as possible, we used both Google Analytics and Snowplow in vanilla e-commerce implementations. This means that for both solutions, we used the out-of-the-box e-commerce events.

Both tools support the definition of custom events, but to make the comparison like-for-like, we stuck to the implementation that a retailer is most likely to deploy. We utilized the Snowplow out-of-the-box e-commerce accelerator. Snowplow Accelerators are recipes/templates that enable Snowplow users to execute specific use cases rapidly. Snowplow E-commerce Accelerators allow online retailers to get started with Snowplow quickly, delivering data to power a wide range of e-commerce analytics out-of-the-box. The accelerators provide a standard way to set up e-commerce tracking (including tracking product views, add-to baskets, and transactions), and data models that optimize delivery of the data for analytics and AI.

Google Analytics has standard out-of-the-box e-commerce events (schemas), which are comparable to those that are part of the Snowplow accelerator. The Snowplow accelerator also includes dbt models that process the data in the data warehouse to make it AI and BI ready, and Google Analytics lacks an equivalent to this. But this is half of what the accelerator is—the other half is the schemas, which Google does have.

The post Digital Analytics and Measurement Tools Evaluation appeared first on Gigaom.

]]>

Cloud Data Warehouse vs. Cloud Data Lakehouse

Wed, 09 Aug 2023 18:32:49 +0000

Recently, several architectural patterns have emerged that decentralize most components of the enterprise analytics architecture. Data lakes are a large part of that advancement.

A GigaOm field test was devised to determine the differences between two popular enterprise data architectural patterns: a modern cloud data warehouse based on a Snowflake architecture and a modern data lakehouse with a Starburst-based architecture. The test was created with multiple components to determine the differences in performance and capability, as well as the amount of time and effort required to migrate to these systems from a legacy environment.

In terms of price-performance, the four scenarios are:

Snowflake: Complete migration to Snowflake
Starburst option 1: Lake adoption and on-premises federation
Starburst option 2: On-premises migration and cloud federation
Starburst option 3: Lakehouse adoption in its entirety

In our field tests, Starburst options required between 47% and 67% less migration effort than Snowflake, slashing time-to-insight and enabling analytical insights that drive business decision-making and yield significant financial impact.

We further divided Snowflake into a Snowflake migrate semi-structured option and a Snowflake additional compute for a semi-structured option for calculating the post-migration effort and the three-year total cost of ownership (TCO). These refer to the initial process and associated costs of transferring and organizing semi-structured data types into the Snowflake platform and supplementing Snowflake’s computational capacity to handle the more resource-intensive processing requirements of semi-structured data. Breaking down the Snowflake usage like this allows the organization to get a clearer picture of the total cost of ownership (TCO) over a three-year period. See Table 1.

Table 1. Costs by Migration Type

Migration Type	Post-Migration Effort Cost	3-Year TCO
Snowflake Migrate Semi-Structured	$1,898,354	$3,366,800
Snowflake Additional Compute for Semi-Structured	$1,290,385	$3,278,344
Starburst Option 1: Lake Adoption & On-Premises Federation	$645,192	$1,597,748
Starburst Option 2: On-Premises Migration & Cloud Federation	$542,548	$1,522,620
Starburst Option 3: Full Lakehouse Adoption	$762,500	$1,853,549
Source: GigaOm 2023

The results of this study show that the Starburst options are the most economical, while the Snowflake migrate semi-structured option and the Snowflake additional compute for the semi-structured option are the costliest. Using Starburst in a data lakehouse approach is the most economical in terms of transition to the architecture and the associated long-term cost.

Our online analytical processing (OLAP) and online transaction processing (OLTP) source data were derived from the TPC-DS benchmark, with the 24 tables of the TPC-DS schema divided between the two source databases.

We believe that by utilizing our legacy systems, we have effectively captured a representation of the current state of many businesses, the majority of which are considering migrating to a more modern architecture. This field test is intended to provide a glimpse into the available options.

In addition, we compared the efficacy of Snowflake-loaded raw JSON in a VARIANT column versus Starburst’s federated query, representing common workloads such as customer analytics, log analytics, clickstream analytics, and security analytics.

Our use cases entail migrations via lift-and-shift. We determined beforehand what constitutes an acceptable level of performance, as measured by the ability to complete the TPC-DS 99 query set in less than 15 minutes and a geometric mean of fewer than five seconds for a single user. According to our experience with businesses conducting efficient migrations, this level of performance meets the user’s requirements. Using these criteria and a pattern observed in enterprise evaluations, we determine the minimal cost-based compute infrastructure required by each platform to meet these performance thresholds.

The post Cloud Data Warehouse vs. Cloud Data Lakehouse appeared first on Gigaom.

]]>

High-Performance Cloud Data Warehouse Testing

Tue, 18 Jul 2023 19:15:45 +0000

Data-driven organizations rely on analytic databases to load, store, and analyze volumes of data at high speed to derive timely insights. At the same time, the skyrocketing volume of data in modern organizations’ information ecosystems place significant performance demands on legacy architectures. To fully harness their data to gain competitive advantage, businesses need modern, scalable architectures and high levels of performance and reliability. In addition, many companies are attracted to fully managed cloud services and their as-a-service deployment models that let companies leverage powerful data platforms without the burden of hiring staff to manage the resources and architecture in-house. With these models, users pay as they play and can stand up a fully functional analytic platform in the cloud with just a few clicks.

This report outlines the results from a GigaOm Analytic Field Test derived from the industry standard TPC Benchmark H (TPC-H) to compare the Actian Platform, Google BigQuery, and Snowflake. The tests we ran revealed important performance characteristics of the three platforms (see Figure 1). On a 30TB TPC-H data set, Actian’s query response times were better than the competition in 20 of the 22 queries. In a test of five concurrent users, Actian was overall three times faster than Snowflake and nine times faster than BigQuery.

In terms of price performance, the Actian Data Platform produced even greater advantages when running the five concurrent user TPC-H queries. Actian proved roughly four times less expensive to operate than Snowflake, based on cost per query per hour, and 16 times less costly than BigQuery.

Figure 1. Overall Query Response Times (in seconds) Across 22 TPC-H Benchmark-Based Queries (lower is better)

The results of these tests indicate that the Actian Data Platform is a great choice for anyone looking to access large analytic data sets quickly and economically. Given the significant speed and cost advantages provided by the platform, it is also an excellent solution for organizations with large complex data sets that need to be accessed quickly and affordably.

The post High-Performance Cloud Data Warehouse Testing appeared first on Gigaom.

]]>

SQL Transaction Processing and Analytic Performance Price-Performance Testing

Fri, 19 May 2023 15:09:32 +0000

The fundamental underpinning of any organization is its transactions. They must be done well, with integrity and performance. Not only has transaction volume soared recently, but the level of granularity in the details has reached new heights. Fast transactions significantly improve the efficiency of a high-volume business. Performance is vital.

There are a variety of databases available for transactional applications. Ideally, any database would have the required capabilities; however, depending on the application’s scale and the chosen cloud, some database solutions can be prone to delays. Recent information management trends see organizations shifting their focus to cloud-based solutions. In the past, the only clear choice for most organizations was on-premises data using on-premises hardware. However, the costs of scale have chipped away at the notion that this is the best approach for some, if not all, companies’ transactional needs. The factors driving operational and analytical data projects to the cloud are many. Still, advantages like data protection, high availability, and scale are realized with infrastructure as a service (IaaS) deployment. In many cases, a hybrid approach is an interim step for organizations migrating to a modern, capable cloud architecture.

This report outlines the results from two GigaOm Field Tests (one transactional and the other analytic) derived from the industry-standard TPC Benchmark E (TPC-E) and TPC Benchmark H (TPC-H). The tests compare two IaaS cloud database offerings running Red Hat Enterprise Linux (RHEL), configured as follows:

RHEL 8.6 with Microsoft SQL Server 2022 Enterprise on r6idn.8xlarge Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances with gp3 volumes.
RHEL 8.6 with Microsoft SQL Server 2022 Enterprise on a E32bdsv5 Azure Virtual Machine (VM) with Premium SSD v2 disks.

Both are installations of Microsoft SQL Server 2022. We tested RHEL 8.6 with a preconfigured machine image. The report is based on Linux. The Windows based performance study was conducted earlier this year.

Testing hardware and software across cloud vendors is challenging. Configurations can favor one cloud vendor over another in feature availability, virtual machine processor generations, memory amounts, storage configurations for optimal input/output, network latencies, software and operating system versions, and the benchmarking workload. Our testing demonstrates a slice of potential configurations and workloads.

As the report sponsor, Microsoft selected the particular Azure configuration it wanted to test. GigaOm selected the closest AWS instance configurations for CPU, memory, and disk configuration.

We leave the issue of fairness for the reader to determine. We strongly encourage you to look past marketing messages and discern what is of value. We hope this report is informative and helpful in uncovering some of the challenges and nuances of platform selection. Price-performance intends to be a normalizer of performance results across different configurations.

The parameters to replicate this test are provided. We used the BenchCraft tool, audited by a TPC-approved auditor who reviewed all updates to BenchCraft. All the information required to reproduce the TPC-E results are documented in the TPC-E specification. BenchCraft implements the requirements documented in Clauses 3, 4, 5, and 6 of the benchmark specification. There is nothing in BenchCraft that alters the performance of TPC-E or this TPC-E-derived workload.

The scale factor in TPC-E is defined as the number of required customer rows per single tpsE. We changed the number of initial trading days (ITD). The default value is 300, which is the number of eight-hour business days to populate the initial database. Instead of an ITD of 300 days, we employed an ITD of 30 days for these tests, resulting in a smaller initial database population in the bigger tables. The overall workload behaves identically with ITD of 300 or 30 as far as the transaction profiles are concerned. Since the ITD was reduced to 30, any results would not comply with the TPC-E specification and, therefore, not be comparable to published results. This is the basis for the standard disclaimer that this is a workload derived from TPC-E.

However, BenchCraft is just one way to run TPC-E. All the information necessary to recreate the benchmark is available at TPC.org (this test used the latest version, 1.14.0). Just change the ITD, as mentioned above.

Azure E32bds_v5 outperforms AWS r6idn in performance and cost-effectiveness for mission-critical workloads. In the TPC-E benchmark test, Azure had 1,435 tpsE compared to 1,179 tpsE for AWS, with 22% better price-per-performance and a more cost-effective three-year commitment. Additionally, Azure provides a 21.7% performance improvement over AWS in SQL Server 2022 Enterprise on RHEL.

The TPC-H results for SQL Server 2022 Enterprise on RHEL show that Azure E32bds_v5 has a higher best queries per hour (QPH) than AWS r6idn, with 367 QPH compared to 305 QPH. This suggests that Azure E32bds_v5 is more performant for these types of queries than AWS r6idn. The difference in performance is quite significant, with Azure E32bds_v5 providing a 20% performance improvement over AWS r6idn.

The results also show that AWS r6idn produced a 28% higher price per performance than Azure E32bds_v5 for the same TPC-H workload on SQL Server 2022 Enterprise running on RHEL with license mobility. This suggests Azure provides better value for money than AWS for this particular setup. Additionally, both solutions offer significant cost savings over traditional on-premises infrastructure.

We have provided enough information in the report for anyone to reproduce these tests. You are encouraged to compile your own representative queries, data sets, data sizes, and test compatible configurations applicable to your requirements.

The post SQL Transaction Processing and Analytic Performance Price-Performance Testing appeared first on Gigaom.

]]>

Transaction Processing & Price-Performance Testing

Tue, 18 Apr 2023 19:57:16 +0000

This GigaOm benchmark report was commissioned by Microsoft.

Transactional applications supported by databases are used by organizations to run their daily operations. By enabling the recording and tracking of transactions, these databases ensure that all business is carried out effectively and efficiently.

Database engines built and tuned for high-throughput transactions handle these transactional workloads. The creation of transactional data is accelerating quickly due to the 24/7 nature and variety of sources of transactions. The data is produced in high volumes and at a high rate, regardless of whether it is used to process financial transactions, analyze video game activities, or track health conditions.

As a result, transaction throughput is essential for read- and write-intensive cloud applications and must be sustained despite database latency, network throughput, API/microservice calls, and other processes.

The evolution of PostgreSQL from conventional, single-node database deployments to distributed implementations of PostgreSQL has elevated its capabilities in the areas of scalability, fault tolerance, and performance. Yet self-managed implementations can be difficult to maintain. Many enterprises therefore turn to fully managed, cloud-distributed implementations of PostgreSQL to leverage the advantages of a distributed database without the headaches of management and maintenance.

In our experience, a fully-managed distributed PostgreSQL implementation should be capable of:

Global distribution: able to replicate your data across multiple regions to provide low-latency access for users across the world
High availability and durability: with at least 99.9% uptime SLA and automatic failover capabilities
Scalability without disruption: scale out your database as needed without downtime to your application
Integration with other cloud services: such as analytics, streaming data, and machine learning
Developer productivity: comes with a variety of tools and APIs, such as SDKs for popular languages, a REST API, and support for protocols like JDBC and ODBC
Cost-effective: offering a range of pricing tiers and pay-for-what-you-use billing to best fit your budget

To evaluate distributed databases, we conducted a GigaOm Transactional Field Test derived from the industry-standard TPC Benchmark C (TPC-C). We compared fully-managed as-a-service offerings of cloud-distributed databases:

Azure Cosmos DB for PostgreSQL
CockroachDB Dedicated
YugabyteDB Managed

Our monthly cost for similar CPU and memory configurations were as follows: Cosmos DB for PostgreSQL, $25,484; Managed YugabyteDB (YugabyteDB), $42,067; and Managed CockroachDB (CockroachDB), $45,384. In terms of new orders per minute (NOPM) performance—analogous to TpmC—the results measured for TPC-C were 1.05 million for Cosmos DB for PostgreSQL, 178,000 for CockroachDB, and 136,000 for YugabyteDB.

We also showed that the CockroachDB and YugabyteDB average CPU utilization rose substantially as the number of TPC-C warehouses increased—from less than 20% at 1,000 warehouses to more than 50% at 10,000 warehouses to 80% or more at 20,000 warehouses. This indicates that NOPM performance is unlikely to improve significantly, even with higher warehouse counts, as available CPU becomes a bottleneck.

Finally, the three-year total system price-performance measurements showed Cosmos DB for PostgreSQL to be the most cost-effective option at $0.87 per throughput (three-year total cost divided by transactions per minute), while CockroachDB was $9.19 and YugabyteDB was $11.13. The lower the price-performance metric, the better the throughput for money the database system provides.

Testing hardware and software across cloud vendors is very challenging. Configurations can favor one cloud vendor over another in feature availability, virtual machine processor generations, memory amounts, storage configurations for optimal input/output, network latencies, software and operating system versions, and the benchmarking workload itself. Our testing demonstrates a slice of potential configurations and workloads. As the sponsor of the report, Microsoft selected the specific Azure configuration it wanted to test. GigaOm then selected the configurations for CockroachDB and YugabyteDB closest to the Azure configuration in terms of CPU and memory configuration.

We leave the benchmark’s fairness for the reader to determine. We strongly encourage you to look past marketing messages and discern what is of value. We hope this report is informative and helpful in uncovering some of the challenges and nuances in platform selection.

The parameters to replicate this benchmark are provided in this document. We used HammerDB v4.6, a well-known and widely available TPC-C workload test kit for Cosmos DB for PostgreSQL and YugabyteDB. We used CockroachDB’s own TPC-C implementation for their tests.

The reason CoackroachDB was tested using its own tooling, rather than the HammerDB test kit, is that CockroachDB does not support HammerDB’s stored procedures. The CockroachDB TPC-C implementation we employed offered more flexibility in testing, which allowed us to create custom workloads and employ parallelized tests for improved performance.

You are encouraged to compile your own representative queries, data sets, data sizes, and test compatible configurations applicable to your requirements.

The post Transaction Processing & Price-Performance Testing appeared first on Gigaom.

]]>

SQL Transaction Processing and Analytic Performance Price-Performance Testing

Wed, 25 Jan 2023 13:00:36 +0000

The fundamental underpinning of any organization is its transactions. They must be done well, with integrity and performance. Not only has transaction volume soared recently, but the level of granularity in the details has reached new heights. Fast transactions greatly improve the efficiency of a high-volume business. Performance is incredibly important.

There are a variety of databases available for transactional applications. Ideally, any database would have the required capabilities; however, depending on the application’s scale and the chosen cloud, some database solutions can be prone to delays. Recent information management trends see organizations shifting their focus to cloud-based solutions. In the past, the only clear choice for most organizations was on-premises data using on-premises hardware. However, the costs of scale have chipped away at the notion that this is the best approach for some, if not all, a company’s transactional needs. The factors driving operational and analytical data projects to the cloud are many. Still, advantages like data protection, high availability, and scale are realized with infrastructure as a service (IaaS) deployment. In many cases, a hybrid approach serves as an interim step for organizations migrating to a modern, capable cloud architecture.

Microsoft SQL Server 2019 Enterprise on Amazon Web Services Windows Server 2022 (AWS) Elastic Cloud Compute (EC2) instances with gp3 volumes
Microsoft SQL Server 2019 Enterprise on Window Server 2022 on Azure Virtual Machines (VM) with the new Premium SSD v2 disks

Both are installations of Microsoft SQL Server 2019, and we tested the Windows Server OS using the most recent versions available as a preconfigured machine image.

Data-driven organizations also rely on analytic databases to load, store, and analyze volumes of data at high speed to derive timely insights. Data volumes within modern organizations’ information ecosystems are rapidly expanding, placing significant performance demands on legacy architectures. Today, to fully harness their data to gain a competitive advantage, businesses need modern, scalable architectures and high levels of performance and reliability to provide timely analytical insights. In addition, many companies like fully managed cloud services. With fully managed as-a-service deployment models, companies can leverage powerful data platforms without the technical debt and burden of finding talent to manage the resources and architecture in-house. With these models, users only pay as they play and can stand up a fully functional analytical platform in the cloud with just a few clicks.

The results of the GigaOm Transactional Field Test are valuable to all operational functions of an organization, such as human resource management, production planning, material management, financial supply chain management, sales and distribution, financial accounting and controlling, plant maintenance, and quality management. The Analytic Field Test results are insightful for many of these same departments today using SQL Server, which is frequently the source for interactive business intelligence (BI) and data analysis.

Testing hardware and software across cloud vendors is challenging. Configurations favor one cloud vendor over another in feature availability, virtual machine processor generations, memory amounts, storage configurations for optimal input/output, network latencies, software and operating system versions, and the benchmarking workload. Our testing demonstrates a narrow slice of potential configurations and workloads.

During our Transactional Field Test, SQL Server 2019 Enterprise on Windows Server 2022 Azure Virtual Machines with Premium SSD v2 disks had 57% higher transactions per second (tps) than AWS SQL Server 2019 Enterprise on Windows Server 2022 with gp3 volumes. Azure’s price-performance is 34% less expensive than the price-performance of AWS SQL Server 2019 on Windows Server 2022 without AWS license mobility/Azure Hybrid Benefit. With AWS license mobility and Azure Hybrid Benefit pricing, SQL Server 2019 on Windows Server 2022 on Azure Virtual Machines provided price-performance that was 47% less expensive than AWS SQL Server 2019 on Windows Server 2022. Azure with Hybrid Benefit price-performance is 54% less expensive than the price-performance of AWS with license mobility and a three-year commitment.

During our Analytic Field Test, SQL Server 2019 Enterprise on Windows Server 2022 Azure Virtual Machines with Premium SSD v2 disks had best queries per hour (QPH); it had 41% higher QPH than AWS SQL Server 2019 Enterprise on Windows Server 2022 with gp3 volumes. The price-performance of SQL Server 2019 on Windows Server 2022 on Azure Virtual Machines without AWS license mobility/Azure Hybrid Benefit proved to be 26% less expensive than AWS SQL Server 2019 on Windows Server 2022 deployments. With license mobility in place, the price-performance advantage for Azure widened to 41%. And for SQL Server 2019 Enterprise on Windows Server 2022 Azure Virtual Machines with license mobility and a three-year commitment, price-performance was 49% less expensive than AWS SQL Server 2019 Enterprise on Windows Server 2022 deployments.

As the report sponsor, Microsoft selected the particular Azure configuration it wanted to test. GigaOm selected the closest AWS instance configuration for CPU, memory, and disk configuration.

In the same spirit of the TPC, price-performance intends to be a normalizer of performance results across different configurations. Of course, this has its shortcomings, but at least one can determine “what you pay for and configure is what you get.”

The parameters to replicate this test are provided. We used the BenchCraft tool, audited by a TPC-approved auditor who reviewed all updates to BenchCraft. All the information required to reproduce the results are documented in the TPC-E specification. BenchCraft implements the requirements documented in Clauses 3, 4, 5, and 6 of the benchmark specification. There is nothing in BenchCraft that alters the performance of TPC-E or this TPC-E-derived workload.

The scale factor in TPC-E is defined as the number of required customer rows per single tpsE. We changed the number of initial trading days (ITD). The default value is 300, which is the number of eight-hour business days to populate the initial database. For these tests, we used an ITD of 30 days rather than 300. This reduces the size of the initial database population in the larger tables. The overall workload behaves identically with ITD of 300 or 30 as far as the transaction profiles are concerned. Since the ITD was reduced to 30, any results would not be compliant with the TPC-E specification and, therefore, not comparable to published results. This is the basis for the standard disclaimer that this is a workload derived from TPC-E.

The post SQL Transaction Processing and Analytic Performance Price-Performance Testing appeared first on Gigaom.

]]>

Advantages of DataStax Astra Streaming for JMS Applications

Thu, 22 Dec 2022 20:18:59 +0000

Competitive markets demand rapid, well-informed decision-making to succeed. In response, enterprises are building fast and scalable data infrastructures to fuel time-sensitive decisions, provide rich customer experiences enable better business efficiencies, and gain a competitive edge.

There are numerous applications being developed that make autonomous decisions about how data is produced, consumed, analyzed, and reacted to in real time. However, if data is not captured within a specific timeframe, its value is lost, and the decision or action that needs to take place never occurs or happens too late.

Fortunately, there are technologies designed to handle large volumes of time-sensitive streaming data. Known by names like streaming, messaging, live feeds, real-time, and event-driven, this category of data needs special attention because delayed processing and decision-making can negatively affect its value. A sudden price change, a critical threshold met, an anomaly detected, a sensor reading changing rapidly, an outlier in a log file—any of these can be of immense value to a decision-maker or a process but only if alerted in time to affect the outcome.

This report’s focus is on real-time data and how autonomous systems can be fed this data at scale while producing reliable performance. To shed light on this challenge, we assess and benchmark two leading streaming data technologies—DataStax Astra Streaming and Apache ActiveMQ Artemis. Both solutions process massive amounts of streaming data from social media, logging systems, clickstreams, Internet-of-Things devices, and more. However, they differ in important ways from throughput and overall scalability to operational ease of use and cost, as we reveal in our hands-on testing.

Astra Streaming is a fully managed, cloud-native, streaming-as-a-service solution built on Apache Pulsar. As a managed solution, Astra Streaming eliminates the overhead of installing, operating, and scaling Pulsar. Astra Streaming also offers out-of-the box support and interoperability between Java Messaging Service (JMS), RabbitMQ, and Kafka in a single platform. This means if your existing applications are relying on these platforms, you can immediately convert them into streaming apps with little to no code changes.

Apache ActiveMQ is an open source, multiprotocol, Java-based message broker. It supports industry standard protocols across a broad range of languages and platforms. There are currently two “flavors” of ActiveMQ available—the well-known classic broker and the next-generation broker code-named Artemis, both compatible with JMS.

In our comparative study, we used the Starlight for JMS feature included in DataStax Astra Streaming along with self-managed open-source Apache ActiveMQ Artemis JMS instances. We found several notable differences and benefits for modernizing a JMS-based data streaming stack.

Astra Streaming with Starlight for JMS architecture is consolidated and simplified, and as a fully managed platform, it provides a number of benefits, including platform management, administration, and recovery functions.

The performance and resiliency of Astra Streaming can easily match ActiveMQ Artemis without the burden of scaling out infrastructure (or scaling down when demand is light). You simply pay for what you use, and DataStax manages the operational back end for you.

We found that in situations where message-per-second throughput rapidly and frequently varies (bursting), Astra Streaming was 2x cheaper in infrastructure costs and up to 4x cheaper in total cost of ownership.

Modernizing JMS-based applications to fully managed Astra Streaming would have many benefits and capability enhancements, including real-time data integration, analytics, and AI/ML applications.

The post Advantages of DataStax Astra Streaming for JMS Applications appeared first on Gigaom.

]]>