Dumps/Snapshot hosts

From Wikitech
< Dumps(Difference between revisions)
Jump to: navigation, search
(Hardware)
(Currently running)
 
(12 intermediate revisions by one user not shown)
Line 28: Line 28:
 
Monitors:
 
Monitors:
  
* snapshot1 -- current monitor node:
+
* snapshot1 -- current monitor node, runing out of /backups/dumps/production, via
*:<code>/bin/bash ./monitor wikidump.conf.monitor</code>
+
*:<code>/bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production</code>
  
 
Worker nodes:
 
Worker nodes:
  
* snapshot1 -- currently running 3 worker processes for bigger wikis out of /backups-atg, via
+
<!-- * snapshot1 -- currently running 3 worker processes for bigger wikis out of /backups/dumps/production, via
*:<code>python ./worker wikidump.conf.bigwikis</code>
+
*:<code>./worker --log --configfile confs/wikidump.conf.bigwikis --basedir /backups/dumps/production</code> -->
* snapshot2 -- running 4 processes for small wikis out of /backups-atg, via
+
* snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via
*: <code>./worker</code>
+
*:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwiki</code>
* snapshot3 -- runs adds/changes dumps from cron
+
*:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki</code>
* snapshot4 -- running en wiki dumps via  
+
* snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via
*: <code>python ./worker.py --configfile wikidump.conf.enwiki enwiki</code>
+
*: <code> /bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production</code>
 +
* snapshot3 -- runs adds/changes dumps from cron as user backup
 +
* snapshot4 -- running en wiki dumps out of /backups/dumps/production, via
 +
*: <code>./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki</code>
 +
 
 +
===Other tasks===
 +
 
 +
* snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables
 +
* snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations
 +
* snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour
 +
 
  
 
[[Category:Dumps]]
 
[[Category:Dumps]]

Latest revision as of 06:00, 2 July 2012

Contents

[edit] Snapshot (XML dumps generation) cluster information

[edit] Hardware

These hosts generate the XML dumps. For information about the hosts that serve them, see Dumps/Dump servers.

We have two mini snapshot clusters.

In Tampa:

  • snapshot1: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
  • snapshot2: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
  • snapshot3: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
  • snapshot4: operational, PowerEdge R815, Ubuntu 10.04, 8GB RAM, 4 8-core Opterons, 2 80GB HDs

In D.C.:

  • snapshot1001: base install done, PowerEdge R815, Ubuntu 10.04, 64GB RAM, 4 8-core Opterons, 2 80GB HDs
  • snapshot1002: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
  • snapshot1003: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
  • snapshot1004: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD

Ordinarily only one cluster will be running dump jobs at a time; the other is on standby in case of various failures. The two beefier servers (with 4 8-core cpus) are dedicated machines for the en wikipedia dumps; as with the other hosts, one of them is in operation and the other is in standby.

[edit] Currently running

Monitors:

  • snapshot1 -- current monitor node, runing out of /backups/dumps/production, via
    /bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production

Worker nodes:

  • snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via
    python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwiki
    python ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki
  • snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via
    /bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production
  • snapshot3 -- runs adds/changes dumps from cron as user backup
  • snapshot4 -- running en wiki dumps out of /backups/dumps/production, via
    ./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki

[edit] Other tasks

  • snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables
  • snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations
  • snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox