Eqiad Migration Planning/Steps

From Wikitech
< Eqiad Migration Planning(Difference between revisions)
Jump to: navigation, search
(Memcached)
(Redis)
Line 46: Line 46:
  
 
Gerrit patchset XXX
 
Gerrit patchset XXX
 +
 +
Redis master switch
  
 
=== Memcached ===
 
=== Memcached ===

Revision as of 16:55, 16 January 2013

Contents

Day 1: Tue Jan 22

Preparation (before maintenance window)

Check LVS pools apaches, api and rendering for down/depooled machines. A few machines may be broken (and should be removed from the config from the time being), but all others should be up and happy in health checks.

# ipvsadm -l
# less /var/log/pybal.log

Check whether the Nagios check for these LVS pools exists and is up.

Check whether all pooled application servers have the right LVS service IPs bound to loopback.

Check deployed MediaWiki revision / git status on all application servers

MySQL warm up?

Ensure media writes to the NetApp are disabled

Migrate bits apaches to eqiad

Check whether the 4 bits apaches are healthy according to a bits Varnish server:

# varnishlog -i Backend_health -O

Test a few top bits URLs manually from the new bits app servers to see if valid content is being returned. To retrieve the most requested URLs, on a bits Varnish server:

# varnishtop -i RxURL

To test such a URL, use CURL, or:

fenari: $ /home/mark/firstbyte.py apache_host_name 80 bits.wikimedia.org URI

Run varnishtop for a histogram of HTTP status codes, and compare before/after migration:

# varnishtop -i TxStatus

Deploy Gerrit patch set 44251 and run Puppet for node group XXX. This will change the apache backends for the eqiad Varnish servers only, giving us a chance to fall back on pmtpa bits Varnish servers quickly if needed.

Check if the distribution of HTTP status codes changes drastically, esp. HTTP 2xx vs. 4xx/5xx.

If bits@eqiad is confirmed to work correctly, after a while deploy Gerrit patchset 44252 and run Puppet for node group XXX. This will switch the pmtpa bits Varnish servers to use the eqiad bits appservers as well.

Set all database shards to read-only

Core databases

External storage

Parser Cache

Parser cache configuration currently lives in wmf-config/CommonSettings.php (near line 350).

Gerrit patchset XXX

Redis

Redis configuration currently lives in wmf-config/CommonSettings.php (near line 360).

Gerrit patchset XXX

Redis master switch

Memcached

Memcached configuration currently lives in wmf-config/mc.php.

Note that because memcached cache content is not replicated between the data center sites, Tampa's memcached servers will need to be cleared prior to switch back.

Master switch on all database shards

General master switch instructions are here: Switch_master.

Possibly mha will be used?

Text Squids backend changes

This is the actual switch of directing clients to eqiad Apaches.

The Squid configuration resides in /home/w/conf/squid on fenari, and is backed by a git repository nowadays. Mark has prepared 3 commits in branch eqiad-switchover, that migrate the image scalers, the API application servers and the regular application servers to eqiad.

For each of these commits, use the following sequence:

Merge the commit onto master:

$ git merge XXX
  1. As root, run make to generate the new configuration files. Make sure there are no permission errors.
# make

Now, run a diff of all new configurations against the configurations currently deployed. Make sure the differences reflect the backend changes you expect.

# diff -ru deployed/ generated/ | less

Finally, deploy the configurations to all Squids. Make sure you have ssh agent forwarding enabled for this step. The configurations will be deployed directly and become active immediately, but will also be pushed to Puppet's volatile file module.

# ./deploy cache

(you can deploy to just pmtpa.text and eqiad.text if you prefer, as long as you do both.)

First migrate the image scalers. They run a limited subset of MediaWiki, and any problems are unlikely to cause harm.

Next, the API application servers.

Finally, normal clients: the regular application servers.

Mobile Varnish backend changes

Deploy Gerrit patch set 44257, and run Puppet on hosts cp1041 .. cp1044.

Database shards read-write in eqiad

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox