User:Bhartshorne/swift tasks 2012-08-13

From Wikitech
Jump to: navigation, search

Contents

to complete before 8/28

  • move mediawiki reading originals to swift (aaron)
    • the deploy to test wiki on monday worked.
  • updated squid and swift/rewrite.py to allow reads for originals (http://upload... but not thumbnails)
    • squid change is acl work similar to how thumbnails got moved
    • rewrite may or may not need changes to accept non-thumbnails and get to the right bucket
  • finish building eqiad cluster
    • ms-be1004 is waiting on a replacement SSD eta friday 8/17
    • ms-be1005 doesn't see any of its spinning disks. RobH to investigate
    • it's ok to continue building the cluster without those two hosts.
  • upgrade to 1.5.0 (with ganglia statsd stuff disabled)
    • test in labs (lucid)
      • done. tested fetching existent and nonexistent thumbs. tested with mismatched proxies and storage servers.
    • test on eqiad (precise)
      • tested mixed cluster upgraded by hand. tested container creation, thumb creation, thumb fetching, lost object recovery.
      • need to test puppet rules (scheduled monday)
    • test mediawiki auth - Jan claims MW fails to auth against 1.4.4+. replicate his test, find and fix the problem (if replicable) (aaron)

to start before 8/28

  • sync content
    • test between eqiad-prod cluster and ??? (eiqad-test? labs?
  • redo zones in pmtpa
  • audit and replace disks across all backends
    • rt-3282 and rt-3432

to do in sept

  • improve reaction-based documentation (instead of feature-based documentation)
    • what to do when a host fails; what to do when a nagios alert triggers (for each nagios alert); etc.
  • improve dead disk detection methods, automate alerting and replacing
    • installed and configured swift-drive-audit to find them.
    • how to hook into nagios?
  • set up swift-recon

to do Sometime(tm)

  • enable 1.5 statsd ganglia stuff
    • disable ganglia-logtailer
    • disable local logging?
    • update ganglia view for new metrics
  • document how to switch from pmtpa to eqiad
    • container synchronization is an eventually consistent thing; how to synchronize the change?
  • have 2 users that interact with containers - one that can create / destroy containers and the other that can't
    • talk to aaron for more detail
  • upgrade pmtpa cluster from lucid to precise
  • add SSDs into ms-be1-5 to get hardware parity with the rest of the ms-be servers (and get the OS and local logs onto SSD instead of sharing with the object store)
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox