Introduction

The goal of this report is to measure the performance of DataStax Enterprise v6 (Cassandra) in terms of latency and throughput. Cassandra is deployed by three types of cluster configuration. There are 4, 10, and 20 nodes. The evaluation was processed by using two workload types.

The first workload is the “update heavy” workload. This workload evaluates 50% of reading requests and 50% of writing requests by using Zipfian as a request distribution.

The second workload is the “short ranges”. The idea of this workload is to invoke 95% of scans and 5% of updates. The request distribution is the same — Zipfian.

As a tool for processing benchmark performance was Yahoo! Cloud Serving Benchmarking (YCSB). YCSB is a framework for evaluating the benchmark performance of the database under different workloads.

The data size is 1 KB records (10 fields, 100 bytes each, plus key). The number of records was chosen according to the size of the cluster. 50 million on a 4-node cluster, 100 million — 10-node cluster, 250 million records — 20-node cluster.


Environment

The following type of EC2 instances on AWS was chosen for deploying Cassandra cluster:

Image for post

The YCSB client was deployed on the compute-optimized instances by AWS:

Image for post

Cluster configuration

DataStax Enterprise (Cassandra) is a wide-column store NoSQL database management system, designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

The table below shows changes which were applied on Cassandra configuration:
Image for post

#performance #cassandra #big-data #nosql #data analysis

Performance benchmark of DataStax Enterprise v6 (Cassandra)
1.25 GEEK