Yes! It works with Amazon EC2(TM)... more often than not
This is the first in the series of articles about Galera on Amazon EC2, here's the follow up.
Galera benchmarks on Amazon EC2 have been long overdue. There are at least 3 reasons why EC2 performance is so important to Galera project:
- EC2 is a buzz nowadays. Everyone and his dog is releasing all kinds of neat applications for EC2 users, from Firefox plugins to complete application grids. So it is natural to anticipate a question: "Can I use Galera to create a fault-tolerant and fast database cluster for my application?"
- When you suddenly need to rent more hardware, EC2 is where you go. No need to search through price lists, wait for delivery, install and wire the stuff. It's all at your fingertips. In other words EC2 could provide us with the hardware resources we need for testing.
- EC2 AMI's could become a valuable demo medium for Galera evaluation. Without EC2 one would need to setup a cluster of several similar machines just to evaluate if Galera is suitable for his application. It is a rare case when such hardware resources are lying around readily available. With EC2 it is just a matter of starting several AMI instances. And no need to clean up after yourself!
Amazon EC2
This time I used 'small' EC2 instances. The target was to see if EC2 is usable at all, develop a benchmarking process for EC2 environment, and get first figures. Small instance is artificially restrained to single CPU core and has 1.7Gb of RAM. It also has only one network interface, so it has to be used for both replication traffic and incoming client connections. Even if one starts several instances at a time, assigned IP addresses come from different networks (and even have different netmasks). This rules out using multicast or broadcast for replication, leaving TCP as the only option. However EC2 network is very good: pings within the same availability zone average to about 0.3ms and between the zones to 1.2ms (figures would fluctuate depending on the total EC2 load). Since EC2 is shared between thousands of users, it also can't be considered a controlled environment, so it is not particularly suitable for benchmarking. However, I took measurements during two days and results were consistent within few percent. Besides, its a real operating environment for many users and thus the results obtained are more true to life.
DBT2 Benchmark
DBT2 is an open-source fair use implementation of TPC-C benchmark spec. which simulates OLTP load. OLTP is one of the types of applications that should benefit from efficient synchronous replication, so it is a very important benchmark for Galera. It issues 5 different types of transactions and can make use of stored procedures. For each transaction type it measures transaction rate, transaction duration, rollback rate and several other metrics. The main metric is NOTPM (New Order Transactions Per Minute). I also took note of average duration and rollback rate of New Order transactions. It has a bunch of configuration options which can be used to tweak its performance. Sounds strange for a benchmark, but is a very useful feature to stress Galera the most.
Original benchmark was optimized according to recommendations at DBA dojo. In all other respects it is unmodified and is not tuned for cluster performance in any way. We expect Galera cluster to be drop-in substitution for single database server.
Benchmarking
Comparing Galera cluster to plain MySQL is not so straightforward. For example, normally production site would enable binary log, which is unnecessary for Galera cluster due to high availability property. On the other hand Galera may use a bit more memory to itself. In the following, binlog was disabled.
For benchmarking we used public CentOS 5.0 images generously provided by RightScale. First, plain 5.1.26rc MySQL (installed from official rpms) was benchmarked to determine the most advantageous configuration. I found that on the "small" instance (with 1.7Gb of RAM) the highest NOTPM score (3590) is achieved with 15 warehouses (DBT2 parameter), innodb_buffer_pool_size=1200M, innodb_log_file_size=300M (MySQL parameters). Apparently in this setup the database contents just barely fits in the RAM. So these parameters were also used when benchmarking Galera MySQL cluster.
Another important DBT2 parameter is the number of concurrent connections. While plain MySQL is saturated with 4 concurrent connections, Galera cluster needs 6 to 8 concurrent connections per node to achieve best results (probably due to additional latency incurred by serialization). I took it to be a tunable parameter. Tests were performed with different number of concurrent connections and the best result was selected.
It must be noted that performance of DBT2 noticeably degrades with time, so that each subsequent benchmark run yields lower scores. I could find no plateau in this performance degradation, so in order to get consistent results I had to reload database data (1.2Gb) before each benchmark. I took notes of database loading time as well.
Results
The first result was that we're still having some nasty bugs which restrained me to use statement-level replication and forced me to rerun some benchmarks several times.
And now let's get to something good. Pictures and data!
NOTPM Rollbacks(%) TRX duration(sec) Dump load(min)
-----------------------------------------------------------------------
Plain 5.1.26: ~3590 1 1.15 10.5
1 node : ~2900 1 1.40 14.8
2 nodes : ~4330 6 0.90 15.2
3 nodes : ~5050 9 0.75 15.6
4 nodes : ~5450 11 0.67 16.3
The following figure shows results in units of plain 5.1.26 performance (except for rollbacks percentage):

It turns out we still have considerable overhead and it shows in a single node cluster, as well as in the database dump load times. However 2 node cluster already performs significantly better than plain MySQL and it keeps on scaling all the way up to 4 nodes, not only improving the throughput, but also reducing individual transaction duration. And despite TCP-based group communication backend, database dump load times don't depend significantly on the number of nodes in the cluster.
Rollbacks issue. While the single node cluster shows the same amount of rollbacks as plain MySQL, we can see that the rollback rate jumps quite high with 2 and more nodes, making more than 4 nodes probably impractical. Experience shows that increasing warehouses parameter improves that figure considerably. We will see it when benchmarking "large" EC2 instances.
However there is more to it than meets the eye. The data displayed is only for the New Order transactions. Other transactions don't show any rollbacks with plain MySQL or single node Galera cluster. However with 2 or more nodes those transactions tend to display rollback rate even higher than New Order transactions. For example Delivery transaction could have up to 18% rollbacks. The cause for these rollbacks is a certification algorithm. I wonder, from a production perspective, if high rollback rate matters, given the otherwise good throughput? Is it merely nuisance or a showstopper?
In any case, these results were obtained on a Galera cluster of "small" EC2 instances, with small number of warehouses. "Large" EC2 instances should show a different picture. Stay tuned.
Conclusion
Amazon EC2 is not particularly well geared towards synchronous replication (single network card, nodes in different network segments, broadcast or multicast is impossible). However Galera can be successfully used on it to achieve high availability and moderate increase in performance. It is also a very useful testbed, especially for a start-up company which cannot afford buying racks of servers.
- alex's blog
- Login or register to post comments
