Vector Databases Comparedv1.0

Evaluating DataStax Astra DB Serverless (Vector) and Pinecone Vector Database

Table of Contents

  1. Executive Summary
  2. Platform Summary
  3. Test Setup
  4. Performance Test Results
  5. Total Cost of Ownership
  6. Conclusion
  7. Disclaimer
  8. About DataStax
  9. About William McKnight
  10. About Jake Dolezal

1. Executive Summary

This report was commissioned by DataStax.

Determining the optimal vector solution from the myriad of vector storage and search alternatives that have surfaced is a critical decision with high leverage for an organization. Vectors and AI will be used to build the next generation of intelligent applications for the enterprise and software industry but the most effective option will often also exhibit the highest level of performance.

The benchmark aims to demonstrate the performance of DataStax Astra DB Serverless (Vector) compared to the Pinecone vector database within the burgeoning vector search/database sector. This report contains comprehensive detail on our benchmark and an analysis of the results.

We tested the critical dimensions of vector database performance—throughput, latency, F1 recall/relevance, and TCO. Among the findings that Astra DB produced versus Pinecone:

  • 55% to 80% lower TCO
  • Up to 6x faster indexing of data
  • Up to 9x faster ingestion and indexing of data

We tested throughput, which involved generating vectors and labels, inserting them into databases, and executing queries to measure performance. The queries were of various types, such as nearest neighbor, range, KNN classification, KNN regression, and vector clustering. We also performed latency testing, which measured the response time for each query. Finally, F1 recall/relevance testing measured the database’s performance in returning relevant results for a given query.

The study found that Astra DB significantly outperformed Pinecone p2x8 in ingesting and indexing data, performing six times faster and with a low relative variance, making the ingest workflow more predictable. Astra DB also showed faster response times during active indexing and produced recall ranging from 90% to 94% versus Pinecone’s recall in the range of 75% to 87%. For larger datasets, Astra DB showed better recall and low variance, demonstrating accuracy and consistency. Finally, our testing and configuration pricing revealed that Pinecone’s TCO was 2.2 to 4.9 times greater than Astra DB, making it significantly more expensive to operate.

The results of these tests indicate that DataStax Astra DB Serverless (Vector) is a great choice for storing and searching vectors efficiently.

Full content available to GigaOm Subscribers.

Sign Up For Free