Synchronous Replication Loves You Again
Submitted by alex on Sat, 04/09/2011 - 23:51
So, the other day I posted the performance benchmarks for the multi-master MariaDB/Galera cluster. Spectacular performance. But some of you may justifiably say:
— Well, we were born into a master/slave world. We, like, adapted to it. We have invested so much brain power to make our applications to work in master/slave environment. What do we do now with all these read/write splitting voodoo Lua scripts and slave lag battling techniques? And master failover... There's a whole industry there. Thousands of jobs!
— But of course! — I say, — Galera likes read/write splitting like the other guy. If you want a node as a slave - just don't use it as a master, simple as that. (Slave lag and master failover will have to go though. Sorry about that.)
Also about a year ago I was so fed up with the expert opinions about how synchronous replication is "slow" (Why "slow"? The word "synchronous" does not even have 'l' in it!) and does not work over WAN, that I ran a quick ad-hoc benchmark with 0.7pre (collector's edition) to see about it. Well, I saw it pretty well, but promised to get back to it later, with a more scientific approach and a more configurable Galera.
So in this installment I'll benchmark synchronous master/slave Galera 0.8pre performance, both in LAN and between Ireland, EU, and Virginia, US. In a scientific way. Yes, with standard deviations and stuff. Amazon EC2, despite its dismal IO performance will help us with that.
The configuration is the same as in the previous article. To make Galera more WAN-ready I added the following options (this must be put on a single line in my.cnf):
wsrep_provider_options="gcs.fc_factor=0.95; gcs.fc_limit=1024; evs.send_window=512; evs.user_send_window=256; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT60S; evs.consensus_timeout=PT90S"
Among other things it allows to have up to 256 transactions in replication at a time. But it in no way compromises synchronous guarantee. For reference see Galera wiki
What do we have here, in order of appearance:
- Stock MariaDB with
innodb_flush_log_on_trx_commit=1as an alternative to synchronous replication.
- Single MariaDB/Galera node, which normally should not be run alone, but it serves as a reference point for master/slave configuration performance. I.e. "how much slower do we get?".
- 2-node master/slave cluster in the same eu-west accessibility zone.
- 2-node master/slave cluster between eu-west (Ireland) and us-east (Virginia) zones.
- 3-node master/salve cluster with 1 slave in eu-west and another in us-east.
- 2-node multi-master cluster in the eu-west zone for reference.
- 2 eu-west master nodes + 1 us-east slave — how does adding a transcontinental slave affect multi-master performance?
- eu-west client connects to standalone us-east server directly. Just to see if we need this WAN replication at all.
A blowup of the most interesting part:
And since we're especially concerned with transaction latencies here, here's the latency profile at 32 threads. Just to see it tad clearer how bad it could be:
Well, what to say here? Synchronous replication in LAN does not really make a difference to a standalone server. Move along, nothing to see here. Except that it is so much faster than flushing logs after each commit. As for the WAN, it does add latency to transaction. Guilty as charged. Up to 90 milliseconds!
Of course we have an alternative to this slow synchronous replication: direct connection to central server across the world. The proponents of this approach are in for an exquisite fun of having several seconds latencies even on idle servers.
We can also notice an interesting property of replication latency contribution — it is only noticeable until transaction execution latency takes over. That is when master becomes saturated. However the same is not true for IO latency, because, unlike data replication, data flushing tends to monopolize the resource. That's where group commit should come in.
...with special thanks to Google docs for visual effects.