Sartoris

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Issues)
(Salt deploy runner)
Line 40: Line 40:
 
A salt runner is a script that runs on the salt master and can combine many salt calls into a single function.
 
A salt runner is a script that runs on the salt master and can combine many salt calls into a single function.
  
The salt deploy runner is called via ''salt-run deploy.<function>''. It has two functions:
+
The salt deploy runner can be called from the deployment server via ''sudo salt-call publish.runner deploy.<function>''. It is called by the git-deploy sync hook. It has two functions:
  
 
;deploy.fetch(repo): calls fetch (via a salt module) on all application servers for the specified repo
 
;deploy.fetch(repo): calls fetch (via a salt module) on all application servers for the specified repo
 
;deploy.checkout(repo): calls checkout (via a salt module) on all application servers for the specified repo
 
;deploy.checkout(repo): calls checkout (via a salt module) on all application servers for the specified repo
  
The runners return a report on which minions returned successfully, failed, or didn't return.
+
Each function returns a report in json on which minions returned successfully, failed, or didn't return.
  
 
=== Salt deploy module ===
 
=== Salt deploy module ===

Revision as of 06:39, 5 December 2012

Contents

Deployment location

  • MediaWiki:
    /srv/deployment/mediawiki/common
    /srv/deployment/mediawiki/slot0
    /srv/deployment/mediawiki/slot1

Deploying

  1. git deploy start
    • At this point, do git pull, checkout, cherry-pick, commit, or whatever other repo changes you need to make.
  2. git deploy sync
    • Alternatively, if you wish to abort a deploy: git deploy abort.

Design

Basic design

Git repositories sit on the deployment system behind a web server. Users initiate a deployment using git-deploy (git deploy start). It writes out a lock file to only allow a single deploy at a time and adds a tag to the repo as a rollback point, in case of a deploy abort. At this point the user updates the repo as necessary, or aborts the deploy. If the user aborts the deploy, it rolls back the repo to the start tag and removes the lock file. Once the deploy is ready, the user completes the deploy (git deploy sync), which causes git-deploy to write out a sync tag then trigger a sync script. It also adds a .deploy file to the repo root, which describes the currently deployed code. The sync script updates the repo and submodules so that the application servers can fetch properly. After doing so it calls a salt run for fetch, then a salt run for checkout to the deploy tag. The sync script will report success or failure. After the sync script is run, git-deploy removes the lock file.

Sync hook

  • Location (on the deployment host): /var/lib/git-deploy/sync/shared.py
  • Managed in the puppet deployment module
  1. Get the repo and submodules ready for fetching:
    1. Update the repo: git update-server-info
    2. Tag all submodules with the same tag as parent repo: git submodule foreach "git tag <tag>"
    3. Update all submoudles: <for each extension in <repo>/.git/modules/extension> git update-server-info
  2. Make the application servers do a fetch (via a salt runner)
  3. Make the application servers do a checkout (via a salt runner)
    1. Switch core to the tag
    2. Update the submodules

Salt deploy runner

  • Location (on the salt master): /srv/runners/deploy.py
  • Managed in the puppet deployment module

A salt runner is a script that runs on the salt master and can combine many salt calls into a single function.

The salt deploy runner can be called from the deployment server via sudo salt-call publish.runner deploy.<function>. It is called by the git-deploy sync hook. It has two functions:

deploy.fetch(repo)
calls fetch (via a salt module) on all application servers for the specified repo
deploy.checkout(repo)
calls checkout (via a salt module) on all application servers for the specified repo

Each function returns a report in json on which minions returned successfully, failed, or didn't return.

Salt deploy module

  • Location (on the salt master): /srv/salt/_modules/deploy.py
  • Managed in the deployment puppet module

A salt module lives on every salt minion and can be called from the salt master or from any peer which is allowed access.

The salt deploy module is called via salt <matching-criteria> deploy.<function>. It has the following functions:

deploy.sync_all
sync all repositories configured. This will also fully clone repositories, if they are missing.
deploy.fetch(repo)
do a git fetch based on the repo location (repo_locations) and url (repo_urls) defined via salt pillars.
deploy.checkout(repo,reset=False)
do a checkout of a repo based on the repo location (repo_locations), and url (repo_urls) from salt pillars, and .deploy file defined on the deployment host. Checkout will also modify the .gitmodules file based on sed configuration defined in salt pillars (repo_regex).

Salt deployment pillars

  • Location (on the salt master): /srv/pillars
  • Managed in the puppet repo: role::deployment::salt_masters::production

Salt pillars are a set of configuration data available on every salt minion (via salt-call pillar.data). Pillars are managed on the master and are distributed to all minions on update.

Naming

   slot0 <- current
   slot1 <- next
   slot2 <- next + 1
   ...

On the deployment system, we should symlink version numbers to the slots, so that it's easy to tell version we are on, for instance:

   /srv/deployment/mediawiki/common/php-1.20wmf1 -> /srv/deployment/mediawiki/slot0
   /srv/deployment/mediawiki/common/php-1.20wmf2 -> /srv/deployment/mediawiki/slot1
   ...

This may be a good place for something like perl's 'storable' which allows you to serialize/deserialize complex data structures for writing to disk or transfer. Depending on what we use slots for it's an efficient way to store more data--e.g. metadata about deployment versions

Python's equivalent is pickle, and in php, we're already using cdb for version info (hetdeploy). The slots scheme would need to work with our hetdeploy stuff, which I think assumes versions. Either we'd need to sync the symlinks to the versions, or do a lot of work on hetdeploy.

Timeline for slots

   slot0=wmf1, slot1=wmf2
   move all wmf1 wikis to wmf2 over time
   replace wmf1 with wmf3
   slot0=wmf3 slot1=wmf2
   start deploying wmf3
   move all wmf2 wikis to wmf3 over time
   replace wmf2 with wmf4
   slot0=wmf3 slot1=wmf4
   etc etc

Or:

   slot0=wmf1, slot1=wmf2
   move all wmf1 wikis to wmf2 over time
   once all are moved, switch slot0 to wmf2, move wikis to slot0
   rinse/repeat for next cycle

Examples

Example deploy of a core change

cd /srv/deployment/mediawiki/common/php-1.20wmf1
git deploy start
git pull
git deploy sync

In the above scenario, 1.20wmf is the current version of MediaWiki we are running. /srv/deployment/mediawiki/common/php-1.20wmf1 is a symlink to /srv/deployment/mediawiki/slot0. When it syncs to the application servers, it is making git fetch and switch to a tag at /srv/deployment/mediawiki/slot0. After switching to the tag, it'll also update all submodules to the versions listed in the tag point.

Example of changing versions of mediawiki

cd /srv/deployment/mediawiki/common
ln -s /srv/deployment/mediawiki/slot1 php-1.20wmf2
cd php-1.20wmf2
git deploy start
git branch --track wmf/1.20wmf2 origin/wmf/1.20wmf2
git checkout wmf/1.20wmf2
git submodule update --init
git deploy sync

This example will the the same thing as the previous example, but it will update /srv/deployment/mediawiki/common/slot1 rather than /srv/deployment/mediawiki/common/slot0.

Example of an emergency live hack

cd /srv/deployment/mediawiki/common/php-1.20wmf1
git deploy start
<make changes>
git commit
git deploy sync

Trying it

tin.eqiad.wmnet is the eqiad deployment host. There are a few eqiad mw hosts configured and ready to be tested. Simply go into /srv/deployment/mediawiki/<repo> and try it out. It is not necessary to forward your ssh agent to this host.

Issues

  • When calling the salt-runner from the deployment host, the runner is being called by a peer. When a runner is called directly on the salt-master, it displays progress as it occurs. When called from a peer it only displays the end-result. This can take a while and doesn't indicate if the deployment is working or hung.
  • When any application server salt minion is down the salt-run calls will take the entirety of their timeout value. fetch is currently set at 2 minutes and checkout is set at 1 minute. This means all deploys will likely take 3 minutes with relatively no feedback (due to the above issue).
  • The runner currently isn't properly returning results about minions that didn't report back.
  • i18n isn't being deployed in the new system

TODOs

Required

  • Create a sudo policy for wikidev users to be able to call the salt runners
    • When 0.10.3 is released, also add an ACL so that we can run this without a sudo policy
    • There's a sudo policy temporarily in place on tin. This needs to be puppetized
  • Add a finish script to git-deploy to write out to IRC
  • Add puppet exec to initialize repo, for new hosts
    • There's a function in the salt deploy module for this, but it needs to be puppetized: salt-call deploy.sync_all
  • Add puppet exec to bring repos up to date before apache starts
    • The above sync_all needs to replace the scap call

Nice to have

  • Rewrite git-deploy in python
    • The perl git-deploy we are using does more than we need, calls the sync scripts awkwardly, and is written in perl.
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox