There is no rigorous rule about how many slave threads one should configure as well as having parallel threads won't guarantee better performance – a lot depends on the application and use case. However parallel applying won't hurt regular operation performance and most likely will drastically speed up syncing new nodes with the cluster.
We suggest to start with 4 slave threads per core, the logic being that in a balanced system 4 slave threads can usually saturate the core. However, depending on IO performance this figure can be increased several times (on the old single core ThinkPad R51 with a 4200RPM drive 32 slave threads make plenty of sense). The top limit on the total number of slave threads can be obtained from
wsrep_cert_deps_distance status variable - it essentially determines how many writesets on average can be applied in parallel. So it is not practical to go higher than that.
Parallel applying requires the following settings:
Transient network connectivity failures are not rare in WAN configuration, so one might want to increase keepalive timeouts to avoid partitioning. The following my.cnf line tolerates 30 second connectivity outages (
wsrep_provider_options = "evs.keepalive_period = PT3S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M"
Try to set
evs.suspect_timeout as high as possible to avoid partitions (as partitions will cause state transfers which are very heavy).
evs.inactive_timeout must be no less than
evs.install_timeout must be no less than
It is not improbable for a WAN link to have have exceptionally high sustained latencies. Take RTT measurements (ping RTT is a fair estimate) between your cluster nodes and make sure that all temporal Galera settings (periods and timeouts, e.g.
evs.join_retrans_period, which is 1 second by default) exceed the biggest RTT in your cluster.
The more masters (nodes which simultaneously process writes from clients) are in the cluster, the higher the probability of certification conflict. This may cause undesirable rollbacks and performance degradation. In such case the number of masters should be reduced.
When only one node at a time is supposed to be used as a master, certain requirements may be relaxed. For example slave queue size is not that critical. Thus flow control may be relaxed:
wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = yes"
This may improve replication performance somewhat by reducing the rate of flow control events. (This setting is safe if suboptimal in multi-master setup as well.)