Causal Order, What is it good for? Absolutely nothing!

Hmm.., It is becoming obvious that I cannot figure out different topic lines anymore.

We have a list of features, that we are planning to add to Galera replication before publishing first release. Causal order, was one such requirement in our list and we had lengthy discussions of how to deal with this.

The Causal order semantics may seem odd, if your application does not have such requirements. But, we have worked with customers where application logic has a strict requirement for preserving causal order, so we know that this is a "real world" problem. And, if not anything else, causal order is an transparency issue for cluster implementation, and transparency is our ultimate goal. But first some theory.

Causal Order

Causal order property gives the application a guarantee, that any requests it sends to the cluster are processed in the same (well, causal) order. There is causal dependency between the requests, in the application space. The application knows the order of the requests and expects the cluster to process them in the same order. With single DBMS server, this order is (or should be) always guaranteed. If a transaction has committed, subsequent transactions will inevitably see the effects. But, with DBMS cluster, this is not necessarily always true.

Why? Give me an example

Suppose, the application has two connections to the cluster, and these connections happen to go to separate DBMS nodes. Now, if the application sends an UPDATE transaction through connection 1, and when the UPDATE request has returned, the application sends a SELECT query, through connection 2. There is a causal dependency between these two requests, the application knows that UPDATE happened before the SELECT and is expecting to see the effects of UPDATE in the result set.
However, the cluster cannot guarantee that the requests are processed in the causal order. The UPDATE transaction is transferred through group communication without delay to the whole cluster, but it is nevertheless possible, that the SELECT hits the second node before the replication has applied the UPDATE's write set. And if replication is slower than the new transaction establishing in the second node, the SELECT will return old values to the application.

How To Provide Causal Order Semantics?

We had a brain storming session and identified three alternatives for providing Causal order feature.

There is a simple way to provide Causal Order by forcing the reads to go through replication. The read requests don't need to be processed in every node, but they need to be sequenced in the replication stream. This guarantees that the SELECT hitting node 2 will have higher total order sequence number than the earlier UPDATE, and SELECT will not be started before the UPDATE. We have tried this approach many years ago, and it works as expected. However, replicating read requests means a lot of overhead for the replication and performance will become a problem if read rate grows too high.

Alternatively, we can make read request wait until all write sets in the slave queue have been applied. The UPDATE transaction's write set is guaranteed to be in every nodes' slave queue before the transaction has committed. So, when SELECT arrives in node 2, the UPDATE is either in slave queue,
or has applied already. Waiting for slave queue to flush, guarantees that SELECT does not process too early and Causal order is preserved.

As a benefit, this approach does not overload replication with read requests, but on the other hand, it makes reads to wait unnecessarily long. We cannot pin point the exact write set in the slave queue, which we have causal dependency with, and therefore need to wait for the full queue to flush.

Both these alternatives add complexity to the implementation and have serious impact on performance. Looking at the pros and cons made it difficult to choose the correct path. Eventually we came up with decision to leave the problem to load balancing layer to deal with. Load balancer can offer causal order service for the application and just direct all causally dependent connections to same node. So, causal order will not be provided by (at least in the first release) of Galera.