Swift/How To
Bhartshorne (Talk | contribs) |
Bhartshorne (Talk | contribs) (→remove failed devices) |
||
| Line 242: | Line 242: | ||
etc. etc. etc. | etc. etc. etc. | ||
</pre> | </pre> | ||
| − | * remove them (in this example I'm removing an entire host; you can remove only a single drive if necessary.) | + | * remove them (in this example I'm removing an entire host; you can remove only a single drive if necessary.) Note that in our environment, account and container device IDs often (but not always) match and object device IDs are different. You should check each ring individually. |
| + | |||
<pre> | <pre> | ||
cp -a /etc/swift ~; cd ~/swift; | cp -a /etc/swift ~; cd ~/swift; | ||
for i in {150..161}; do | for i in {150..161}; do | ||
swift-ring-builder account.builder remove d$i | swift-ring-builder account.builder remove d$i | ||
| − | |||
| − | |||
done | done | ||
</pre> | </pre> | ||
Revision as of 00:29, 21 August 2012
General Prep
Nearly all of these commands are best executed from a swift proxy host (eg ms-fe1.pmtpa.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf or in the private puppet repository.
All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to to at WMF. For this reason I leave out many options and caveats and assume things like the authentication type (swauth) used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.
Set up an entire swift cluster
This is documented elsewhere: Swift/Setup_New_Swift_Cluster
Indivdual Commands - interacting with Swift
Create a user / account
This assumes swauth is prepped (swauth-prep)
- generate a password: pass=$(pwgen -s 12 1)
- add the user: swauth-add-user -A http://127.0.0.1/auth/ -K $super_admin_key -a myaccount newuser password
- (swift has multiple accounts, each account has users, each user has a password)
- note - swift's user's passwords are visible (plaintext) to anybody with the $super_admin_key
- test it and retrieve the account id:
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key myaccount
- you're looking for newuser's "account_id": "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120"
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key myaccount
Remove a user / account
Show account information
This assumes swauth is prepped (swauth-prep)
The same command will show all accounts, all users within an account, or information specific to an individual user within an account, depending on the arguments passed
- show all accounts
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key
- show all users for an account
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key account
- show a specific user
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key account user
Troubleshooting - if you leave off the -A you will likely get socket.error: [Errno 111] ECONNREFUSED
Get the AUTH account string
- using the swauth tool and the swauth super admin key:
- where $user is your username and $proxy is a proxy node and $key is the swauth super_admin_key:
- swauth-list -A http://$proxy/auth -K $key $user
- eg
swauth-list -A http://127.0.0.1/auth -K abcdefghijkl test
- eg
- the account AUTH string is labeled account_id and looks like AUTH_01234567-89ab-cdef-0123-456789abcdef (AUTH_8-4-4-4-12)
- using curl and an account / password pair
- where $account:$user is an account:user pair and $key is the user password and $proxy is a proxy server:
- curl -k -v -H 'X-Auth-User: $account:$user' -H 'X-Auth-Key: $key' http://$proxy/auth/v1.0
- eg
curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0
- eg
- the account AUTH string is the last part of the X-Storage-URL header
Get an authenticated session token
The session token is temporary (a few hours?) and should be refetched if you are using one and get a 401 permission denied. The token is returned in the header of a GET request sent with appropriate authentication headers. The token is returned in two headers, the X-Storage-Token and X-Auth-Token. I think the X-Storage-Token is deprecated.
root@copper:/etc/swift# curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0
* About to connect() to 127.0.0.1 port 8080 (#0)
* Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /auth/v1.0 HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 127.0.0.1:8080
> Accept: */*
> X-Auth-User: test:tester
> X-Auth-Key: testing
>
< HTTP/1.1 200 OK
< X-Storage-Url: http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8
< X-Storage-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53
< X-Auth-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53
< Content-Length: 126
< Date: Fri, 23 Mar 2012 23:23:54 GMT
<
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0
{"storage": {"default": "local", "local": "http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8"}}root@copper:/etc/swift#
Create a container
You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.
Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)
- create a container with default permissions (r/w by owner and nobody else)
- swift -A http://ms-fe.pmtpa.wmnet/auth/v1.0 -U mw:thumbnail -K $pass post container-name;
- create a container with global read permissions
- swift -A http://ms-fe.pmtpa.wmnet/auth/v1.0 -U mw:thumbnail -K $pass post -r '.r:*' ${cont}
List containers and contents
It's easiest to do all listing from a frontend host on the cluster you wish to list. You will need the account password to do any listing.
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the key (account password) from config:
pass=$(grep "^key" /etc/swift/proxy-server.conf | cut -f 3 -d\ )
list of all containers
- ask for a listing of the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumbnail -K $pass list
list the contents of one container
- ask for a listing of the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumbnail -K $pass list wikipedia-commons-local-thumb.a2
list specific objects within a container
example: look for all thumbnails for the file Little_kitten_.jpg
- start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
- Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
- eg wikipedia-commons-local-thumb.a2
- Note - only some containers are sharded:
grep shard /etc/swift/proxy-server.confto find out if your container should be sharded - unsharded containers leave off the shard eg wikipedia-commons-local-thumb
- ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
-
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumbnail -K $pass list --prefix a/a2/Little_kit wikipedia-commons-local-thumb.a2 - note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.
-
Note that if you have a pile of things you need to look at, you can use [1] a little script that reads image filenames (spaces converted to underscores) on stdin and writes out full pathnames (/export/thumbs/...) to the location of the file on ms5. You can grab the two-level hash out of there if you like.
Show specific info about a container or object
Note - these instructions will only show containers or objects the account has permission to see.
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the key (account password) from config:
pass=$(grep "^key" /etc/swift/proxy-server.conf | cut -f 3 -d\ ) - ask for statistics about all containers:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat - ask for statistics about the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2 - ask for statistics about an object in a container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg
Delete a container or object
Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!
Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the key (account password) from config:
pass=$(grep "^key" /etc/swift/proxy-server.conf | cut -f 3 -d\ ) - run
swift staton the object you want to delete-
ben@ms-fe1:~$ swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
-
- swap stat for delete in the same command.
-
ben@ms-fe1:~$ ^stat^delete^
-
When you call delete for a container it will first delete all objects within the container and then delete the container itself.
Individual Commands - Managing Swift
Show current swift ring layout
There are three rings in Swift: account, object, and container. The swift-ring-builder command with a builder file will list the current state of the ring.
- swift-ring-builder /etc/swift/account.builder
- swift-ring-builder /etc/swift/container.builder
- swift-ring-builder /etc/swift/object.builder
Rebalance the rings
You only have to rebalance the rings after you have made a change to them. If there are no changes pending, the attempt to rebalance will fail with the error message "Cowardly refusing to save rebalance as it did not change at least 1%."
To rebalance the rings you run the actual rebalance on a copy of the ring files then distribute the rings to the rest of the cluster (via puppet).
- copy and rebalance (I usually do this on ms-fe1, though any swift host will work.)
cp -a /etc/swift ~ cd ~/swift/ for i in account container object ; do swift-ring-builder $i.builder rebalance ; done
- push
~/swift/{account,container,object}.{builder,ring.gz}into puppet in the appropriate cluster- The ring files live in the volatile section of puppet. This is currently
stafford:/var/lib/puppet/volatile/swift/${cluster-name}/ - production is currently in
puppet:///volatile/swift/pmtpa-prod/*
- The ring files live in the volatile section of puppet. This is currently
- run puppet on all hosts in the cluster
Add a proxy node to the cluster
- Update
site.ppin puppet to make the new proxy match existing proxies in that cluster- likely you'll include role::swift::xxx-yyy::proxy
- maybe some ganglia-related stuff
- Update the xxx-yyy config section in role/swift.pp
- add the new server to the list of memcached_servers
- Run puppet on the host twice, reboot, and run puppet again
- Test the host
- curl for a file that exists in swift from both a working host and the new host
- eg:
curl -o /tmp/foo -v -H "Host: upload.wikimedia.org" http://ms-fe2.pmtpa.wmnet/wikipedia/commons/thumb/a/a2/Little_kitten_.jpg/46px-Little_kitten_.jpg
- Add the new proxy to the load balancer (full details) if it's a load balanced cluster
Remove a failed proxy node from the cluster
- Take the failed node out of the load balancer if necessary
- Update the puppet configuration for the cluster
- remove the failed node from the memcached list in the role/swift.pp in the cluster config
Add a storage node to the cluster
Start by doing the normal setup paying attention to the desired swift server layout.
Puppet will take care of all disks that are only 1 partition used for data - you should pass it all non-OS disks. You may have to create partitions on the OS disk for swift storage. The following is what I ran on ms-be1 (where the bios is on sda1 and sdb1, the OS partition is raided across 120GB partitions on sda2 and sdb2, and sda3 and sdb3 are swap):
# parted
) help
) print free
) mkpart swift-sda4 121GB 2000GB
) select /dev/sdb
) print free
) mkpart swift-sdb4 121GB 2000GB
) quit
# mkfs -t xfs -i 512 -L swift-sda4 /dev/sda4
# mkfs -t xfs -i 512 -L swift-sdb4 /dev/sdb4
# mkdir /srv/swift-storage/sd{a,b}4
# chown -R swift:swift /srv/swift-storage/sd{a,b}4
# vi /etc/fstab # <-- add in a line for sda4 and sdb4 with the same xfs options as the rest
# mount -a
# reboot # just for good measure
After Puppet has finished setting up Swift and all device partitions are mounted successfully, add them to the rings. (Since the two partitions on sda and sdb are slightly smaller than the rest, they should get an appropriately smaller weight, eg 95 instead of 100.)
Add a device (drive) to a ring
Select the following values:
- zone : each rack is its own zone; all servers within a rack and all drives within a server should be the same zone
- list all the drives to see what zones are in use with swift-ring-builder /etc/swift/account.builder (see above)
- ip - ip of the storage node
- dev - the short name of the partition - eg 'sdc1'
- weight - for a 2TB drive, 100. Adjust for larger or smaller drives than the rest of the cluster. (eg a 500 drive would get 25, a 4TB drive 200)
cp -a /etc/swift ~; cd ~/swift/;
swift-ring-builder account.builder add z${zone}-${ip}:6002/${dev} $weight
swift-ring-builder container.builder add z${zone}-${ip}:6001/${dev} $weight
swift-ring-builder object.builder add z${zone}-${ip}:6000/${dev} $weight
Example, to add device /dev/sda4 on ms-be5:
swift-ring-builder account.builder add z5-10.0.6.204:6002/sda4 100 swift-ring-builder container.builder add z5-10.0.6.204:6001/sda4 100 swift-ring-builder object.builder add z5-10.0.6.204:6000/sda4 100
After you're done, you must rebalance the three rings and push them out to the rest of the cluster.
Remove a failed storage node from the cluster
Remove each of the devices on the failed node from the rings, rebalance, and distribute the new ring files.
Remove (fail out) a drive from a ring
There are two conditions in which you will want to remove a device from service
- when the device is dead or the host is down and unreachable
- when it's still working but you want to decommission it or pull it out for service
For the former, you just remove the device; for the latter, you can nicely pull data off the device before shutting it off by changing the device weight first.
remove failed devices
The command to remove a device is swift-ring-builder /etc/swift/<ring>.builder remove d###. Here's the sequence:
- find the IDs of the devices you want to remove. You're looking for the 'id' using the IP address and name as your keys. You should verify that the ID is the same across all three rings; I'm only showing one ring here for the example.
root@ms-fe2:~# swift-ring-builder /etc/swift/account.builder
/etc/swift/account.builder, build version 192
65536 partitions, 3 replicas, 5 zones, 161 devices, 0.10 balance
The minimum number of hours before a partition can be reassigned is 3
Devices: id zone ip address port name weight partitions balance meta
0 1 10.0.0.250 6002 sda1 25.00 844 0.02
1 1 10.0.0.250 6002 sdaa1 25.00 844 0.02
2 1 10.0.0.250 6002 sdab1 25.00 844 0.02
3 1 10.0.0.250 6002 sdad1 25.00 844 0.02
4 1 10.0.0.250 6002 sdae1 25.00 844 0.02
5 1 10.0.0.250 6002 sdaf1 25.00 844 0.02
etc. etc. etc.
- remove them (in this example I'm removing an entire host; you can remove only a single drive if necessary.) Note that in our environment, account and container device IDs often (but not always) match and object device IDs are different. You should check each ring individually.
cp -a /etc/swift ~; cd ~/swift;
for i in {150..161}; do
swift-ring-builder account.builder remove d$i
done
- rebalance the rings and distribute them.
remove working devices for maintenance
To remove a device for maintenance, you set the weight on the device to 0, rebalance, wait a while (a day or two), then do your maintenance. The examples here assume you're removing all the devices on a node. Note that I'm only checking one of the three rings but taking action on all three. To be completely sure we should check all three rings but by policy we keep all three rings the same.
- find the IDs for the devices you want to remove (in this example, I'm pulling out ms-be5)
root@ms-fe1:/etc/swift# swift-ring-builder /etc/swift/account.builder search 10.0.6.204
Devices: id zone ip address port name weight partitions balance meta
186 8 10.0.6.204 6002 sda4 95.00 1993 -12.24
187 8 10.0.6.204 6002 sdb4 95.00 1993 -12.24
188 8 10.0.6.204 6002 sdc1 100.00 2098 -12.23
189 8 10.0.6.204 6002 sdd1 100.00 2097 -12.27
190 8 10.0.6.204 6002 sde1 100.00 2097 -12.27
191 8 10.0.6.204 6002 sdf1 100.00 2097 -12.27
192 8 10.0.6.204 6002 sdg1 100.00 2097 -12.27
193 8 10.0.6.204 6002 sdh1 100.00 2097 -12.27
194 8 10.0.6.204 6002 sdi1 100.00 2097 -12.27
195 8 10.0.6.204 6002 sdj1 100.00 2097 -12.27
196 8 10.0.6.204 6002 sdk1 100.00 2097 -12.27
197 8 10.0.6.204 6002 sdl1 100.00 2097 -12.27
- set their weight to 0
cp -a /etc/swift ~; cd ~/swift/
for id in {186..197}; do
for ring in account object container ; do
swift-ring-builder ${ring}.builder set_weight d${id} 0
done
done
- check what you've done
root@ms-fe1:~/swift# swift-ring-builder account.builder search 10.0.6.204
Devices: id zone ip address port name weight partitions balance meta
186 8 10.0.6.204 6002 sda4 0.00 1993 999.99
187 8 10.0.6.204 6002 sdb4 0.00 1993 999.99
188 8 10.0.6.204 6002 sdc1 0.00 2098 999.99
189 8 10.0.6.204 6002 sdd1 0.00 2097 999.99
190 8 10.0.6.204 6002 sde1 0.00 2097 999.99
191 8 10.0.6.204 6002 sdf1 0.00 2097 999.99
192 8 10.0.6.204 6002 sdg1 0.00 2097 999.99
193 8 10.0.6.204 6002 sdh1 0.00 2097 999.99
194 8 10.0.6.204 6002 sdi1 0.00 2097 999.99
195 8 10.0.6.204 6002 sdj1 0.00 2097 999.99
196 8 10.0.6.204 6002 sdk1 0.00 2097 999.99
197 8 10.0.6.204 6002 sdl1 0.00 2097 999.99
- rebalance the rings and distribute them to the rest of the cluster
Nuke a swift cluster
only do this on test clusters - it is unrecoverable and destroys all the data in the cluster
- on all servers:
- stop all services:
swift-init all stop - remove all ring data:
rm /etc/swift/*.{builder,ring.gz}
- stop all services:
- on the storage nodes:
- remove all storage content:
for i in /srv/swift-storage/sd*; do rm -r $i/*& done(or just reformat the drives - faster)
- remove all storage content:
The swift cluster is now destroyed. To rebuild, follow the instructions in Swift/Setup_New_Swift_Cluster
Change the origin server
- in puppet
- switch this line in /etc/swift/*.conf - thumbhost = ms5.pmtpa.wmnet to new server