Eqiad Migration Planning
From Wikitech
Contents |
Coordination
We now have an incomplete tracking ticket in RT that depends on more specific tickets.
Needed Server Builds
- App and API Apaches
- Image scalers?
- Swift
- servers online; needs cluster replication enabled
- Memcached servers
- half still need to be racked
- 10GB cards need to be reconfigured to enable PXE
- Databases - done
-
one more slave is needed per shard
-
- Poolcounter
- Not allocated? Should run a 2-3 misc servers
- Netapp
- /home
- Deployment server (fenari's deployment support infrastructure part, misc::deployment etc)
- new server
- Hume equivalent (misc::maintenance)
- Application logging server - for mediawiki wmerrors + apache syslog
-
eqiad version of the udp2log instance on nfs1 that writes to /home/w/logs -
server 'flourine' for apache logs
-
Software / Config Requirements
- Varnish software to handle media streaming efficiently
- awaiting patch from Varnish Software (target Sept?)
- patch MediaWiki to use a different upload hostname for large files. Then we could use Squid or some specialized media streaming proxy for large files.
- MediaWiki deploy support for per colo config variances [Bugzilla 39082]
- generating eqiad and pmtpa dsh groups
- new mediawiki conf files for eqiad
- replicating the git checkouts, etc. to new /home
Actually Failing Over
- deploy db.php with all shards set to read-only in both pmtpa and eqiad
- deploy squid and mobile + bits varnish configs pointing to eqiad apaches
- master swap every core db and writable es shard to eqiad
- deploy db.php in eqiad removing the read-only flag, leave it read-only in pmtpa
- the above master-swap + db.php deploys can be done shard by shard to limit the time certain projects are read-only
- dns changes - our current steady state is to point wikipedia-lb.wikimedia.org in the US to eqiad but future scenarios may include external dns switches.
- Swift replication reversal - from Eqiad to Tampa
Improving Failover
- pre-generate squid + varnish configs for different primary datacenter roles
- implement MHA to better automate the mysql master failovers
- migrate session storage to redis, with redundant replicas across colos
See more
- Records and original tracking doc - http://etherpad.wikimedia.org/EQIAD-rollout-sequence
- Category:Eqiad cluster
Parking Lot Issues
- Identify and plan around the deployment/migration date
- Migration needs to happen before Fundraising season starts in Nov.
- Vacation 'freeze'; all hands on deck week before and after deployment