Eqiad Migration Planning/Steps
Day 1: Tue Jan 22
Preparation (before maintenance window)
Check LVS pools apaches, api and rendering for down/depooled machines. A few machines may be broken (and should be removed from the config from the time being), but all others should be up and happy in health checks.
# ipvsadm -l # less /var/log/pybal.log
Check whether the Nagios check for these LVS pools exists and is up.
Check whether all pooled application servers have the right LVS service IPs bound to loopback.
Check deployed MediaWiki revision / git status on all application servers
MySQL warm up?
Ensure media writes to the NetApp are disabled
Migrate bits apaches to eqiad
Check whether the 4 bits apaches are healthy according to a bits Varnish server:
# varnishlog -i Backend_health -O
Test a few top bits URLs manually from the new bits app servers to see if valid content is being returned. To retrieve the most requested URLs, on a bits Varnish server:
# varnishtop -i RxURL
To test such a URL, use CURL, or:
fenari: $ /home/mark/firstbyte.py apache_host_name 80 bits.wikimedia.org URI
Run varnishtop for a histogram of HTTP status codes, and compare before/after migration:
# varnishtop -i TxStatus
Deploy Gerrit patch set 44251 and run Puppet for node group XXX. This will change the apache backends for the eqiad Varnish servers only, giving us a chance to fall back on pmtpa bits Varnish servers quickly if needed.
Check if the distribution of HTTP status codes changes drastically, esp. HTTP 2xx vs. 4xx/5xx.
If bits@eqiad is confirmed to work correctly, after a while deploy Gerrit patchset 44252 and run Puppet for node group XXX. This will switch the pmtpa bits Varnish servers to use the eqiad bits appservers as well.
Set all database shards to read-only
Core databases
External storage
Setting MySQL read-only is part of master switch?