Why Consistency Issues or C in CAP theorem

As many of you probably know, Cassandra is an AP big data storage. In other words, when a network partition happens, Cassandra remains available and relaxes the Consistency property. It is always said that it is eventually consistent or, in other words, it will be consistent at some point in time in future.

The important things to know which is not really obvious are:

  • **The cluster does **become inconsistent pretty often. Sure, there are many things influencing the stability of the cluster, such as proper configuration, dedicated resources, production load, professionalism of the ops guys etc, but the fact is the probability the nodes are going down from time to time and therefore the data become inconsistent are really high.
  • The cluster does NOT become consistent again **automatically. **This is something which goes against the god feeling towards the modern and mature distributed systems. Unless you have the enterprise version of Datastax and enable one the latest feature of DSE v6 you have to fix the inconsistency issues manually.

Ways to Fix Inconsistency

Fortunately, there are ways to fix the inconsistency issues. There a couple of options here:

  • nodetool repair tool. This is probably the main and default method to use. Running the command on a node which was down for all the tables or specific ones. One caveat though is: all the nodes should be UP while you are running the command.
  • read repair Cassandra feature. This is an important feature meaning, during the read requests the cluster organism is repairing itself, to be more precise it repairs the proper data replicas. If the replicas involved in a read requests are not consistent they are being aligned again

#nodetool-repair #big-data #cassandra #data-science

How to Fix Cassandra Consistency Issues using Read Repair
1.80 GEEK