Parallel applying - what is it good for? absolutely nothing!
That's how my friend, Boss, would summarize our experiences with new parallel replication applying method, which is our latest enhancement in the Galera applying module.
I have for long been monitoring the resource usage in Galera cluster and it became obvious that a node, acting in slave role, takes very little of CPU cycles. This is simple to see when the cluster is used in pure master-slave mode, by directing all SQL traffic to just node. Then, even with 100% write load, the master node runs really hot, but slave nodes consume just 1-5% of CPU. To balance the resource usage more evenly, it seemed necessary to let the slaves to apply incoming transactions with several threads.
So, we went ahead and added parallel applying mode for the Galera write set applier module. Technically this was quite simple to implement. We have now a job queue and worker thread model, where workers can accept jobs (for write set applying) from the job queue, and process them independently.
However, parallel write set applying is possible only for such write sets, which don't conflict with each other. We needed to use certification module for judging which write sets can actually process in parallel and job queue manager takes care of restricting workers to do the applying safely.
Moreover, the appliers can run freely only until they are ready to commit. Then they need to synchronize with job manager to make sure that committing happens in the replication total order. So in effect, parallelism is allowed only during the actual SQL/RBR applying phase.
We also added new MySQL system variable, galera_slave_threads, which defines how many workers will be launched. By using galera_slave_threads variable, it was pretty simple to test cluster performance with different number of applier threads.
And the results of parallel applying? Well, I tried with different configurations but could not see any significant difference in performance. With small write sets, the applying workers don't usually even interleave their execution. When I raised transaction lengths up to 200 statements, then there seemed some (minor) benefit of having parallel applying. But in real life applications, we don't expect to see such jumbo transactions. Parallel applying method begins to look like a wasted effort.
One explanation for this lack of the expected performance boost, can be that my network throughput and CPU characteristics (single CPU, hyperthreading) just don't allow faster delivery of write sets for the slaves. It might be that parallel applying becomes effective with more powerful environment. So, I don't bury this code in my background just yet. It perhaps deserves more thorough testing with more advanced hardware.
There was one more heart attack experience in our parallel applying project. Just when our implementation was complete, I happened to notice this software patent: http://www.freepatentsonline.com/US20080163222.html
IBM is apparently paving their software fortress walls with patent papers. Fortunately however, we seem to be out of this patent's domain. This patent is restricted to asynchronous replication systems only.
Software patents - what are they good for?
- seppo's blog
- Login or register to post comments
