Table of Contents
MySQL/Galera Configuration Tips
Parallel Applying (wsrep_slave_threads)
There is no rigorous rule about how many slave threads one should configure as well as having parallel threads won't guarantee better performance – a lot depends on the application and use case. However parallel applying won't hurt regular operation performance and most likely will drastically speed up syncing new nodes with the cluster.
We suggest to start with 4 slave threads per core, the logic being that in a balanced system 4 slave threads can usually saturate the core. However, depending on IO performance this figure can be increased several times (on the old single core ThinkPad R51 with a 4200RPM drive 32 slave threads make plenty of sense). The top limit on the total number of slave threads can be obtained from wsrep_cert_deps_distance status variable - it essentially determines how many writesets on average can be applied in parallel. So it is not practical to go higher than that.
Parallel applying requires the following settings:
innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1
WAN Replication
Transient network connectivity failures are not rare in WAN configuration, so one might want to increase keepalive timeouts to avoid partitioning. The following my.cnf line tolerates 30 second connectivity outages (evs.suspect_timeout):
wsrep_provider_options = "evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M"
Try to set evs.suspect_timeout as high as possible to avoid partitions (as partitions will cause state transfers which are very heavy). evs.inactive_timeout must be no less than evs.suspect_timeout and evs.install_timeout must be no less than evs.inactive_timeout.
It is not improbable for a WAN link to have have exceptionally high latencies. Take RTT measurements (ping RTT is a fair estimate) between your cluster nodes and make sure that all temporal Galera settings (periods and timeouts, e.g. evs.join_retrans_period) exceed the biggest RTT in your cluster.
Multi-Master
The more masters (nodes which simultaneously process writes from clients) are in the cluster, the higher the probability of certification conflict. This may cause undesirable rollbacks and performance degradation. In such case the number of masters should be reduced.
Master-Slave
When only one node at a time is supposed to be used as a master, certain requirements may be relaxed. For example slave queue size is not that critical. Thus flow control may be relaxed:
wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = yes"
This may improve replication performance somewhat by reducing the rate of flow control events. (This setting is safe if suboptimal in multi-master setup as well.)