WordPress is the most popular CMS on the planet, powering over 40% of the internet and continually growing. Despite what you have been told, WordPress isn’t dying, and it’s not the terrible mess some dramatic PHP hating developers make it out to be.
I was tasked with building a WordPress capable of sustaining millions of visitors per minute (a project I can’t reveal the details of just yet). The brief was a platform that could handle upwards of 100,000 visitors per second. Keep in mind this is across hundreds of WordPress Multisite domains, not one single site.
With some simple configuration, you can create a WordPress site (or Multisite) that is capable of handling millions of visitors (think Black Friday sales as a huge traffic spike).
In Beanstalk, I have a PHP-based load-balanced application that operates a minimum of 10 t4g.large EC2 instances and a dedicated instance that ingests content from third-party API’s and writes it to the database.
For my load balancer scaling triggers, I am using the CPUUtilization metric, using an average statistic. When the health of my instances drops, I have set it to scale up to three instances at a time (also t4g.large). I might change this based on network traffic, but I find CPU Utilisation works best as my php-fpm processes consume a lot of CPU under load.
Using Amazon Elastic Beanstalk and some auto-scaling will get you part of the way, but you will encounter database bottlenecks at this scale; this is where MySQL replication is your friend. Amazon Aurora in AWS RDS is your friend, creating database clusters with a write and read instance, allowing you to create multiple reader instances, which you can then split the traffic across.
For years, the plugin appeared to be neglected, but what needs to be updated? Fortunately, Automattic (the company behind WordPress) offers a free plugin called Hyperdb, which allows you to split your read and write operations across multiple database instances. It works well and was recently updated.
The downside of Hyperdb is that the documentation sucks.
In my scenario, the primary database is for write operations, where authored content, pages, comments, and other write-oriented data will hit. Because multiple sites are using WordPress Multisite, the write database must be only available for write data.
In my db-config.php
file this is my primary database configuration:
$wpdb->add_database(array( 'host' => DB_HOST, 'user' => DB_USER, 'password' => DB_PASSWORD, 'name' => DB_NAME, 'write' => 1, 'read' => is_admin() ? 1 : 0, 'dataset' => 'global', ));
One nifty little trick I picked up from somewhere (I think it was a GitHub Gist) is making the primary database only for write operations and allowing it to be used for read operations if we are in the administration panel using the is_admin function.
I then have two reader instances currently, both of which are just copies of the main database.
$wpdb->add_database(array( 'host' => DB_HOST, 'user' => DB_USER, 'password' => DB_PASSWORD, 'name' => DB_NAME, 'write' => 1, 'read' => is_admin() ? 1 : 0, 'dataset' => 'global', )); $wpdb->add_database(array( 'host' => 'reader-1.rds.amazonaws.com', 'user' => DB_USER, 'password' => DB_PASSWORD, 'name' => DB_NAME, 'write' => 0, 'read' => 1, 'dataset' => 'global' )); $wpdb->add_database(array( 'host' => 'reader-2.rds.amazonaws.com', // If port is other than 3306, use host:port. 'user' => DB_USER, 'password' => DB_PASSWORD, 'name' => DB_NAME, 'write' => 0, 'read' => 1, 'dataset' => 'global' ));
HyperDB will then spread out the reads to these reader instances. The primary instance is reserved for write operations (which are more expensive); this affords us an admin panel that is highly available and insulated somewhat from the front-end as we can rely on it for reading operations, even if the other two instances go down.
I have this on my to-do list, but the plan is to eventually shard out some parts of the database as it grows in size. As more sites are added to the platform, database size will be a concern, and sharding will be needed, which Hyperdb also quickly provides for us.
Hi Dwayne, thanks for this post. Im in the process of doing the same, though at a bit of a smaller scale. Would you be able to provide some more details on this part of your post:
“In Beanstalk, I have a PHP-based load-balanced application that operates a minimum of 10 t4g.large EC2 instances and a dedicated instance that ingests content from third-party API’s and writes it to the database.”
If you have links to reference material to accomplish this part, that’ll be helpful. Im looking at how to create the .ebextensions scripts for instance replication.
But i am also curious how you’re using the dedicated instance as well–is that your primary instance where all code/content is replicated from?