Overcoming the odds and the evens
From Seppo's post you already know that writeset replication isn't particularly suited for table level locking. I would not consider it as a show-stopper though.
As one dude would have surely put it had he been a database developer: "To those who cling to table locking in the age of universally available transactional engines, know that you are on the wrong side of history..."
We develop replication for transactional databases and transactional means transactional. Table locking is a dead end in every regard, including replication, and must rest in peace. It would be a strategic mistake on our behalf to dedicate serious effort to support table locking.
But the problem does not end there. In fact, Drupal 5.15 distribution contains only two cases of table locking, which would be easy to fix. We can expect that the third-party modules would contain few locks as well. The real problem starts when you look at autocommit updates which Drupal has tons of. (By 'update' here I mean any DML query)
On InnoDB level autocommit queries are implicitly converted to single statement transactions. Single statement transactions is the load Galera best suited for - the probability of certification conflict is minimal. So, what is the problem?
The probability of certification conflict is minimal, but not zero. For a hot-spot tables it can be quite high. In the case of a certification conflict the query is aborted with a deadlock error and must be simply retried, like the usual transaction. So I thought: "Big deal! Application must check the error code and retry the query if needed."
Hell, no! Drupal does indeed check error codes, bit it does not care to retry in the case of a deadlock. Why? - Because (and this is the sad part of the story) the semantics of an autocommit update simply does not have a room for a deadlock (and they are all 1 row updates). Nobody expects it. Nobody's gonna handle it. Can't blame those Drupal developers even though my trust in people is shattered.
So what does it mean for us and for Galera? Well, not much, except that the number of applications that can work with galera cluster out of the box turns to be much lower than we expected. There is no doubt that very few applications that use autocommits or single statement transactions are prepared to handle deadlocks on them.
But (and this is the bright part of the story) autocommit or single statement transaction semantics has a nice surprise for us, replicators: in case of certification conflict we can silently retry it without notifying the application. It is even faster that way.
So we learned something today: there are no unresolvable problems. It is either not a problem but bad practice (table locks) or you have to find the right approach (autocommits).
- alex's blog
- Login or register to post comments
