User:Bhartshorne/swift tasks 2012-08-13
From Wikitech
< User:Bhartshorne(Difference between revisions)
Bhartshorne (Talk | contribs) |
Bhartshorne (Talk | contribs) (added some prioritization and dates) |
||
| Line 1: | Line 1: | ||
| + | |||
| + | == to complete before 8/28 == | ||
* move mediawiki reading originals to swift (aaron) | * move mediawiki reading originals to swift (aaron) | ||
* updated squid and swift/rewrite.py to allow reads for originals (http://upload... but not thumbnails) | * updated squid and swift/rewrite.py to allow reads for originals (http://upload... but not thumbnails) | ||
| Line 9: | Line 11: | ||
*** done. tested fetching existent and nonexistent thumbs. tested with mismatched proxies and storage servers. | *** done. tested fetching existent and nonexistent thumbs. tested with mismatched proxies and storage servers. | ||
** test on eqiad (precise) | ** test on eqiad (precise) | ||
| + | == to start before 8/28 == | ||
* sync content | * sync content | ||
** test between eqiad-prod cluster and ??? (eiqad-test? labs? | ** test between eqiad-prod cluster and ??? (eiqad-test? labs? | ||
| − | |||
| − | |||
| − | |||
| − | |||
* redo zones in pmtpa | * redo zones in pmtpa | ||
| + | * audit and replace disks across all backends | ||
| + | == to do in sept == | ||
* improve reaction-based documentation (instead of feature-based documentation) | * improve reaction-based documentation (instead of feature-based documentation) | ||
** what to do when a host fails; what to do when a nagios alert triggers (for each nagios alert); etc. | ** what to do when a host fails; what to do when a nagios alert triggers (for each nagios alert); etc. | ||
| − | |||
* improve dead disk detection methods, automate alerting and replacing | * improve dead disk detection methods, automate alerting and replacing | ||
| + | == to do Sometime(tm) == | ||
| + | * enable 1.5 statsd ganglia stuff | ||
| + | ** disable ganglia-logtailer | ||
| + | ** disable local logging? | ||
| + | ** update ganglia view for new metrics | ||
* document how to switch from pmtpa to eqiad | * document how to switch from pmtpa to eqiad | ||
** container synchronization is an eventually consistent thing; how to synchronize the change? | ** container synchronization is an eventually consistent thing; how to synchronize the change? | ||
Revision as of 18:30, 14 August 2012
Contents |
to complete before 8/28
- move mediawiki reading originals to swift (aaron)
- updated squid and swift/rewrite.py to allow reads for originals (http://upload... but not thumbnails)
- squid change is acl work similar to how thumbnails got moved
- rewrite may or may not need changes to accept non-thumbnails and get to the right bucket
- finish building eqiad cluster
- ms-be1003 and 1005 need re-installs, 1004 is waiting on a replacement SSD eta friday 8/17
- upgrade to 1.5.0 (with ganglia statsd stuff disabled)
- test in labs (lucid)
- done. tested fetching existent and nonexistent thumbs. tested with mismatched proxies and storage servers.
- test on eqiad (precise)
- test in labs (lucid)
to start before 8/28
- sync content
- test between eqiad-prod cluster and ??? (eiqad-test? labs?
- redo zones in pmtpa
- audit and replace disks across all backends
to do in sept
- improve reaction-based documentation (instead of feature-based documentation)
- what to do when a host fails; what to do when a nagios alert triggers (for each nagios alert); etc.
- improve dead disk detection methods, automate alerting and replacing
to do Sometime(tm)
- enable 1.5 statsd ganglia stuff
- disable ganglia-logtailer
- disable local logging?
- update ganglia view for new metrics
- document how to switch from pmtpa to eqiad
- container synchronization is an eventually consistent thing; how to synchronize the change?