User:Bhartshorne/swift tasks 2012-08-13

From Wikitech
< User:Bhartshorne(Difference between revisions)
Jump to: navigation, search
(to do in sept)
(to complete before 8/28)
Line 5: Line 5:
 
** rewrite may or may not need changes to accept non-thumbnails and get to the right bucket
 
** rewrite may or may not need changes to accept non-thumbnails and get to the right bucket
 
* finish building eqiad cluster
 
* finish building eqiad cluster
** ms-be1003 and 1005 need re-installs, 1004 is waiting on a replacement SSD eta friday 8/17
+
** ms-be1004 is waiting on a replacement SSD eta friday 8/17
 +
** ms-be1005 doesn't see any of its spinning disks.  RobH to investigate
 +
** it's ok to continue building the cluster without those two hosts.
 
* upgrade to 1.5.0 (with ganglia statsd stuff disabled)
 
* upgrade to 1.5.0 (with ganglia statsd stuff disabled)
 
** test in labs (lucid)
 
** test in labs (lucid)
 
*** done.  tested fetching existent and nonexistent thumbs.  tested with mismatched proxies and storage servers.   
 
*** done.  tested fetching existent and nonexistent thumbs.  tested with mismatched proxies and storage servers.   
 
** test on eqiad (precise)
 
** test on eqiad (precise)
 +
 
== to start before 8/28 ==
 
== to start before 8/28 ==
 
* sync content
 
* sync content

Revision as of 00:21, 15 August 2012

Contents

to complete before 8/28

  • move mediawiki reading originals to swift (aaron)
  • updated squid and swift/rewrite.py to allow reads for originals (http://upload... but not thumbnails)
    • squid change is acl work similar to how thumbnails got moved
    • rewrite may or may not need changes to accept non-thumbnails and get to the right bucket
  • finish building eqiad cluster
    • ms-be1004 is waiting on a replacement SSD eta friday 8/17
    • ms-be1005 doesn't see any of its spinning disks. RobH to investigate
    • it's ok to continue building the cluster without those two hosts.
  • upgrade to 1.5.0 (with ganglia statsd stuff disabled)
    • test in labs (lucid)
      • done. tested fetching existent and nonexistent thumbs. tested with mismatched proxies and storage servers.
    • test on eqiad (precise)

to start before 8/28

  • sync content
    • test between eqiad-prod cluster and ??? (eiqad-test? labs?
  • redo zones in pmtpa
  • audit and replace disks across all backends
    • rt-3282 and rt-3432

to do in sept

  • improve reaction-based documentation (instead of feature-based documentation)
    • what to do when a host fails; what to do when a nagios alert triggers (for each nagios alert); etc.
  • improve dead disk detection methods, automate alerting and replacing
    • installed and configured swift-drive-audit to find them.
    • how to hook into nagios?

to do Sometime(tm)

  • enable 1.5 statsd ganglia stuff
    • disable ganglia-logtailer
    • disable local logging?
    • update ganglia view for new metrics
  • document how to switch from pmtpa to eqiad
    • container synchronization is an eventually consistent thing; how to synchronize the change?
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox