Table of Contents
- Summary
- Data Pipeline Platforms
- Field Test Setup
- Field Test Results
- Conclusion
- Disclaimer
- About Fivetran
- About William McKnight
- About Jake Dolezal
- About GigaOm
- Copyright
1. Summary
Data is the currency of digital transformation. Having available data that is understood, organized, and believable strengthens all major corporate initiatives. However, maintaining this basic resource is a growing challenge for most organizations because sources and volumes of interesting data are expanding rapidly. The cloud and the proliferation of SaaS companies has contributed to the data explosion. While the possibilities of the cloud and its many applications can quickly grow the capabilities of an organization, the data spread it creates can lead to problems such as decentralized data leading to inaccurate findings, or wasted time spent rebuilding pipelines instead of driving results.
Without robust automation, an organization’s data movement needs can quickly outpace the ability of a data engineering staff to meet that need.
Given growing workloads and a lack of data engineering resources, automation and ease of use are fundamentally important. Data pipelines are one aspect of the modern data stack that can be automated to solve for this growing challenge.
In this report, we compare the three major data pipeline platforms: Matillion, Stitch, and Fivetran; and run them through a series of selected tests that highlight their degree of automation, ease of setup, and documentation. We evaluated aspects that include the time and effort required to set up a source-destination connection, the degree of automation throughout the process, and the quality of documentation to support the effort. These areas address the three major “humps of work” we have encountered in our field work with data pipelines.
Of the three offerings, Fivetran had the shortest and easiest setup. Matillion Data Loader produced the longest setup, with the most steps. Matillion also had some steps that were poorly documented. Stitch ranked between Fivetran and Matillion Data Loader in our assessment, but it had the longest-running individual task (selecting which Salesforce entities to sync).
Fivetran handled the data source changes with full automation, while Matillion Data Loader presented the biggest automation challenge. Not only did the new data/altered columns not appear automatically in Matillion, but the pipeline had to be rebuilt. Stitch likewise required manual intervention to work with new data/altered columns.
Fivetran had the most thorough documentation across all the items we measured. Stitch also had good documentation, with only a few items either omitted or left short. Matillion Data Loader’s documentation describing the data of the source data connector for Salesforce was nearly completely missing.
Another observation: We found the level of loading and updating activity in Snowflake caused by the Matillion solution to be excessive compared to Stitch and Fivetran. Data pipelines like these are well worth exploring for any enterprise data integration effort, especially where your source and target are supported.