Table of Contents
- Summary
- Why scaling distributed DBMS is challenging
- Database as a Service
- Big and fast data
- Elasticity
- Real-time operational analytics
- Priced to be an embedded service
- Business use case study: iOffer
- Key takeaways
- Appendix: The modern SQL DBMS: Saving lives with faster diagnostics
- Medical information needed to disseminate new knowledge faster
- MedExpert builds its medical knowledge management system using a SQL DBMS
- The transition from SQL Server to the Clustrix scale-out SQL DBMS
- Conclusions
- About George Gilbert
- About GigaOm
- Copyright
1. Summary
In the past couple of years, we’ve seen more innovation in the SQL database management system (DBMS) category than in the 30-plus years since commercial products became available. The past decade’s web 2.0 sites have mostly driven this innovation, which looks to bridge some of the gap between NoSQL DBMS and MySQL.
MySQL was the most common early foundation for these web apps, but it required partitioning storage across many servers when there was more than a modest amount of data. Here was the core problem these early websites faced: They could use MySQL distributed across many servers for scalability, or they could use it on one server and let the DBMS take care of data integrity and full SQL queries, the two foundations of 30 years of data-management best practices. But once customers traded away those two foundations in return for scalability, they began embracing alternatives. NoSQL DBMS featured not only scalability but also more flexibility in handling new data types and new ways of manipulating that data. These products included not only Hadoop but also Cassandra, Couchbase, and MongoDB, among others. More recently the new SQL DBMS vendors such as Clustrix, MemSQL, NuoDB, and VoltDB have combined the elastic scalability of NoSQL products with increasingly comprehensive support for data integrity and SQL queries in a distributed environment.
The Software-as-a-Service (SaaS) applications of the future, whether next-generation consumer web and mobile services or traditional enterprises connecting with their customers, will likely build on the largely complementary foundations of the emerging distributed SQL and NoSQL DBMS.
Unlike SaaS versions of traditional enterprise applications such as CRM, financials, and HR, these new applications are about more than administrative efficiency. Rather, they power new applications. Some of the most common are:
- Online advertising
- Game-session management
- Network-intrusion detection
- Fraud detection
- Risk management
- Ecommerce
The typical applications have many common features. We will examine this category of DBMS based on this feature set as a framework.
- Database as a Service
- Big and fast data
- Elastic capacity
- Real-time operational analytics
- Priced to be an embedded service
This report will help IT business decision makers navigate the emerging requirements of distributed SQL DBMS supporting these new SaaS applications. Although application developers have assumed more influence with the growing importance of line-of-business applications, IT business decision makers still need to understand the requirements. They will ultimately have to support these DBMS as part of the services their companies deliver to their end customers. In addition, over time central IT will have to help manage the proliferation of DBMS by providing guidance to groups with common requirements. This report will take readers through the common requirements for a DBMS to support these new applications. We will explain what each requirement means, why it’s important, and more precisely what to look for in a product.
The emerging distributed SQL DBMS are focused on the lower right portion of the figure below. On the x-axis, the data capture speed is high. At the same time, the decision latency is low by virtue of the analytics speed. Analytics are real-time. The DBMS on which we are focusing in this report typically must interact with DBMS elsewhere on this spectrum of activities. Analyzing historical data for exploratory or production reporting might take place in a data warehouse. And predictive modeling on larger data sets might take place offline with Hadoop as the foundation.
Source: IBM Global Technology Outlook