Galera is a wsrep service provider in a form of dlopenable library. It provides synchronous replication guarantees and supports multi-master1) replication. Among other things Galera is capable of unconstrained parallel applying AKA “parallel replication”, multicast replication and automatic node provisioning. Galera primary focus is data consistency: transaction is either applied on every node or not at all. See CaP theorem for consequences.
Galera is not
What it does is keeping databases synchronized provided they were properly configured and synchronized in the beginning.
A: “Galera” is another (more original) way to name a “galley”.2) Galleys were propelled by oars and “scaled” up to hundreds of rowers who had to act synchronously lest the oars intertwine and get blocked.
A: Galera is a true multi-master synchronous replication system which allows to use any or all of the nodes as master at any time without any extra provisions, so there is no “failover” in the “traditional” MySQL master-slave sense. Galera foremost goal is data consistency across the nodes and it simply won't allow any modifications to the database if it may compromise consistency. E.g. it will block or reject write request until the joining node is synced with the cluster and is ready to process local requests.
As a result you can safely use your favorite approach to distribute/migrate connections between the nodes without risk to cause inconsistency. Some discussion on connection distribution can be found here.
A: Cluster upgrade can be done applying the following steps to each cluster node:
Of course, you should take care about transferring clients connections from node which is being upgraded to another nodes for the time of migration.
A: All levels can be used because locally, in one node, transaction isolation works as with native InnoDB.
But cluster wide, between transactions processing in separate nodes, Galera implements transaction level called “SNAPSHOT ISOLATION”. The SNAPSHOT ISOLATION level is between REPEATABLE READ and SERIALIZABLE levels.
SERIALIZABLE level cannot be guaranteed in multi-master use case, because Galera replication does not carry transaction read set. Also, SERIALIZABLE transaction is vulnerable for multi-master conflicts. It holds read locks and any replicated write to read locked row will cause transaction abort. Hence it is recommended not to use it in Galera Cluster.
A: This happens when you explicitly specify
wsrep_provider variable, i.e. demand using wsrep provider services, but provider for some reason rejects it. With Galera it may be because the node is not connected to the cluster primary component (
wsrep_cluster_address unset, networking issues). In such case the node is considered unsynced with global state and unable to serve SQL requests except
SHOW. If you wish to bypass this check, switch wsrep service off by
mysql> SET wsrep_on=0;
This will make
mysqld to ignore
wsrep_provider setting and behave as a standalone MySQL server. Note that this may lead to data inconsistency with the rest of the cluster (which may be a desirable effect, e.g. modifying “local” tables).
A: For DDLs and similar queries Galera has a special mode of execution called Total Order Isolation (TOI) where the query is replicated in a statement form before executing on master, waits for all preceding transactions to commit and then gets executed in isolation on all nodes simultaneously. The isolation can of the following levels:
There are several particularities about TOI queries:
A: In a two node cluster, when one node ungracefully leaves the cluster, the remaining node is unable to tell whether disappeared node has actually crashed or just lost connection without some additional information from a 3rd party (like human operator or another Galera node). Such situation is called split-brain because if in the case of connection loss the nodes continue to operate without being able to synchronize, their databases will become inconsistent. Thus, when a node loses connection to it's only peer it stops accepting queries to avoid database inconsistency. To bring it back to operational mode it must be instructed that it is the only node in the cluster by setting
wsrep_cluster_address variable accordingly:
mysql> SET GLOBAL wsrep_cluster_address='gcomm://';
If it is desirable to allow the node to operate even in split-brain situation (for example the cluster is used in a master-slave mode, so it is guaranteed that only one node accepts writes), it can be enabled by setting
pc.ignore_sb Galera option, like:
wsrep_provider_options="pc.ignore_sb = yes"
in my.cnf file.
Another option is to use Galera Arbitrator as a 3rd “node” to avoid split-brain situation altogether.
A: Real life experience shows that usually it is due to one or more of the following 3 reasons:
While it is beyond this document to detail how to properly configure any of the above, here are some quick tips to get you going:
To disable SELinux use the following command (as
# setenforce 0
To disable it permanently, set
This seems to be Ubuntu-only thing. Disable AppArmor for mysqld:
$ cd /etc/apparmor.d/disable/ $ sudo ln -s /etc/apparmor.d/usr.sbin.mysqld $ sudo service apparmor restart
By default MySQL/Galera node needs ports 3306 and 4567 to be open for connections from the other nodes. This can be done by something like this (allows connections from LAN):
# iptables -A INPUT -i eth0 -p tcp -m tcp --source 192.168.0.1/24 --dport 3306 -j ACCEPT # iptables -A INPUT -i eth0 -p tcp -m tcp --source 192.168.0.1/24 --dport 4567 -j ACCEPT
Substitute real values for IP address of your node and netmask.
A: It means that from time to time slave thread tries to apply a replicated writeset and finds a lock conflict with a local transaction (which may be in a commit phase already). For slave thread to proceed local transaction is aborted. This is a consequence of an optimistic (expecting no row conflicts) transaction execution and is expected in multi-master configuration, so you might see any of the following messages:
110906 17:45:01 [Note] WSREP: BF kill (1, seqno: 16962377), victim: (140588996478720 4) trx: 35525064 110906 17:45:01 [Note] WSREP: Aborting query: commit 110906 17:45:01 [Note] WSREP: kill trx QUERY_COMMITTING for 35525064 110906 17:45:01 [Note] WSREP: commit failed for reason: 3, seqno: -1
To avoid such conflicts there are two options:
A: “Sort of” - maybe, but not really. GCache is a memory allocator for writesets and its primary purpose is to minimize writeset footprint in RAM. It is not a “log” of events, it is a cache. What is the difference?:
However it is possible to construct a binlog out of GCache contents.
gcache.size is the size of the disk buffer that Galera _permanently_ maintains to cache writesets. It will preallocate it on disk on startup like InnoDB preallocates its log files. This size essentially defines how many writesets the donor node can serve in IST (incremental state transfer), instead of copying the whole data directory in SST (state snapshot transfer). Naively you would want it to be about the same size as the data because after that it SST arguably becomes cheaper than IST. However there may be other considerations (e.g. IST is more lightweight and less taxing on the donor than SST, lack of disk space, etc.) which may swing your decision one way or another.
I would put it to 600Gb to allow for some data growth.
gcache.page_size on the other hand is just a minimal size of any additional page files to be created for writesets that cannot be allocated in the main buffer for whatever reason. It does not mean much, just a lower boundary. If a writeset is of 1Gb in size, the page for it will be 1Gb. So there is hardly any reason to change it unless you desperately need to control your disk usage granularity.
A: If the state transfer donor is not set explicitly in the
wsrep_sst_donor variable, group communication module will select the donor based on its information about node states.
Group communication module monitors node state for the purpose of flow control, state transfer and quorum calculations. E.g.
JOINING node does not count towards flow control and quorum.
The node is considered to be fit for being a donor if it is in
SYNCED state. Theoretically
JOINED could do too, but at first it has to process its slave queue and then it becomes
SYNCED anyways. So the donor is chosen among
SYNCED nodes (the first there is in the index) and it immediately becomes
DONOR, i.e. not available for next requests. Full node state machine can be found here. There is no guarantee about the order of nodes in the index.
If there are no free
SYNCED nodes at the moment, the joining node will report
Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)
and will keep on retrying state transfer request.
A: Whatever happened, your node(s) no longer think that they are a part of the primary cluster component (i.e. they suspect that there may be another primary cluster component which they have no connection to). This could be a result of network failure, crash of the half or more of the nodes, split brain, etc.
If you are certain that no other nodes of your cluster play the primary component role, you can rebootstrap the primary component by:
SHOW STATUS LIKE 'wsrep_last_committed'. Choose the node with the highest value.
SET GLOBAL wsrep_provider_options='pc.bootstrap=yes' on it.
- the component this node is part of will become a primary component, all nodes in it will synchronize to the most updated one and start accept SQL requests once again.
A: Use recent version of sysbench (>= 0.5) and apply ”–oltp-auto-inc=off” option while running your sysbench benchmarks.
You can find explanation and more information in the related discussion thread.
A: Seems like you have tried to update mysql.user table directly to change details. If you use GRANT then it should be perfectly replicated.
Currently replication works only with InnoDB storage engine. Any writes to tables of other types, including system (mysql.*) tables are not replicated. However, DDL statements are replicated on statement level, and changes to mysql.* tables will get replicated that way. So, you can safely issue:CREATE USER… or GRANT… but issuing: INSERT INTO mysql.user… will not be replicated. In general, non-transactional engines cannot be supported in multi-master replication.
A: Unfortunately this is a side effect of being multi-master and having several appliers: in order to deterministically detect conflicts and schedule parallel appliers we need to control when DDL ends relatively to other transactions, so effectively it needs to be executed in isolation. Galera has a 65K window tolerance when transactions can be applied in parallel, but if ALTER takes too long, the cluster has to wait.
In general there's not much that can be done at the moment. But if you can guarantee that no other session will try to modify the table AND that there are no other DDLs running, you can set wsrep_OSU_method=RSU then run the required ALTER command and reset wsrep_OSU_method=TOI, and do it on each node in turn.
A: Normally if IST fails the joining node will abort. SST phase precedes IST, so if IST fails no SST can happen. The only reason it can happen is when the node is automatically restarted again. Automatic node restarting is an “anti-pattern” and is seriously discouraged. This is different from a situation when IST is not possible and full SST happens. There can be exactly two reasons for this:
wsrep_sst_donoroption. At the moment (10.10.2013) Galera cache is reset on every restart, so any node that was (re)started after the joining node position will be unable to deliver IST. Also the joining node position may be too far in the past and required events have been expired from the cache in favour of newer ones.