Swift/Tasks
From Wikitech
< Swift
This page contains links to RT and more detail on the tasks and ordering necessary to get from now to thumbnails on swift. Note that this page only holds the Ops tasks; whatever is going on in Engineering around FileRepo and other Mediawiki work is not included. Talk to AaronSchulz for that.
Contents |
Overall Schedule
Estimating usage
There's a nice deadline for this project - ms5 is on its way to running out of disk space. Based on the ganglia graphs, we can run some numbers.
- as of 2011-12-05, we have 2.4T plus an additional 490G in LVM for a total of 2.9TB available space.
- our rate of consumption is about 1.4TB per month, as measured over the 2 weeks from Nov 17th through Dec 1st. During October, there was a period during which we were consuming about 2.8T per month, and before that we were consuming about 0.4T per month. 1.4T/mo seems like a reasonable estimate going forward but is not the most we have seen in recent history.
- By deleting unused (and less frequently used) thumbnails, we have reclaimed about 1 week worth of space consuption. We expect we can reclaim about 2 more weeks before the technique loses effectiveness.
- We reclaimed massive amounts of space and as of 2012-01-23 we're at the same point we were on 2011-10-10 or so.
Schedule going forward
Keeping in mind the deadline there is not much wiggle room. Here is the expected schedule for switching to swift for thumbnails. I will update this schedule every week indicating slips or things accomplished ahead of schedule.
- Week of Dec 5
- debugging swift-proxy middleware to get upload URLs to pass through correctly [DONE]
- setting up performance testing cluster in pmtpa [DONE] owa1-3 for proxy, ms1-3 for storage
- Week of Dec 12
- modifying puppet to be able to maintain multiple clusters
- speccing storage hardware (ES RAID card won't work)
- Week of Dec 19
- performance testing swift using owa1-3 and ms1-3
- Week of Dec 26
- Winter holiday; nothing gets done
- Week of Jan 2
- Week of Jan 9
- Week of Jan 16
- work on code to improve the performance when containers get full - hash containers for commons and enwiki
- Week of Jan 23 page last updated to here
- production test hardware arrives, set up networking etc.
- test that it can expose the disks as we need - raw to the OS
- verify everything else about the new dell c-series is acceptable (ssh to the management inteface, etc.)
- order production hardware
- Week of Jan 30
- work on monitoring - ganglia and nagios
- test performance impact of varying the number of proxy hosts
- Week of Feb 6
- install and configure production installation
- Week of Feb 13
- final testing of production hardware
- final testing of mediawiki thumbnail purging code
- recreate list of all files on ms5 in preparation for populating swift
- Week of Feb 20
- move 5% of production traffic to go through swift with ms5 as a backend
- deploy mediawiki code to handle purging images from swift
- start copying all content into swift (will this take only 1 week for all thumbs and images?)
Week of Feb 27
- move 100% of production traffic to go through swift
Week of Later
- configure swift to call out to the image scaling cluster directly instead of going via ms5
Done
- Get an initial test cluster set up, installed by puppet
- Hand this cluster off to Aaron for mediawiki work
- puppet class for pmtpa cluster configs - class swift::proxy::testpmtpaclusterconf
- get stats on qps for ms5 from ms5 (ben)
- acquire 2 apache servers in pmtpa to use a proxy hosts (ben) RT-2064 (owa1-3)
- rebuild ms1-4 as linux to be used as storage nodes for a performance test cluster in pmtpa (mark)
- create swift rings and get pmtpa cluster operational (ben)
- write scripts to automate performance tests on swift (ben) (ab and geturls.py)
- performance analysis of ms5 (current data rates, etc.) detail: Swift/Preliminary_test_plan#Background_information
- puppet work to parameterize config class (mark)
- templatize rsync configs to reuse existing rsync puppet configs
Not Yet Done
- puppetize netfilter settings
- puppet work to generalize swift vs. swift host (mark, later)
- moving mediawiki configs into a .deb (ben)
- write test scripts to automate performance and functional tests of swift functionality
- use test scripts to evaluate performance of first iteration swift hardware
- update proposed production hardware based on performance evaluation
- order production hardware
- more puppet work
- more documentation work
- review main Swift page, move bits to more appropriate pages (overall system architecture, deploy plan, test plan, maintetance tasks, etc.).
- clearly detail procedure to
- add a new node
- fail out a disk
- update the process for creating a new wiki to also create a location for thumbnails
parked for the future
- details around logging, metrics, etc.
- include the ability to export a stream of all new content for folks that want to mirror commons etc.