Dumps/Snapshot hosts
From Wikitech
< Dumps(Difference between revisions)
ArielGlenn (Talk | contribs) (Created page with "==Snapshot (XML dumps generation) cluster information== ===Hardware=== We have two mini snapshot clusters. In Tampa: *snapshot1: '''operational''', PowerEdge 1950, Ubuntu 10...") |
ArielGlenn (Talk | contribs) (→Currently running) |
||
| (16 intermediate revisions by one user not shown) | |||
| Line 2: | Line 2: | ||
===Hardware=== | ===Hardware=== | ||
| + | |||
| + | These hosts ''generate'' the XML dumps. For information about the hosts that ''serve'' them, see | ||
| + | [[Dumps/Dump servers]]. | ||
We have two mini snapshot clusters. | We have two mini snapshot clusters. | ||
| Line 16: | Line 19: | ||
*snapshot1001: '''base install done''', PowerEdge R815, Ubuntu 10.04, 64GB RAM, 4 8-core Opterons, 2 80GB HDs | *snapshot1001: '''base install done''', PowerEdge R815, Ubuntu 10.04, 64GB RAM, 4 8-core Opterons, 2 80GB HDs | ||
*snapshot1002: '''base install done''', PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD | *snapshot1002: '''base install done''', PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD | ||
| − | *snapshot1003: | + | *snapshot1003: '''base install done''', PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD |
| − | *snapshot1004: | + | *snapshot1004: '''base install done''', PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD |
| + | |||
| + | Ordinarily only one cluster will be running dump jobs at a time; the other is on standby in case of various failures. The two beefier servers (with 4 8-core cpus) are dedicated machines for the en wikipedia dumps; as with the other hosts, one of them is in operation and the other is in standby. | ||
===Currently running=== | ===Currently running=== | ||
| − | * | + | Monitors: |
| − | *:<code>python ./worker wikidump.conf.bigwikis</code> | + | |
| − | * | + | * snapshot1 -- current monitor node, runing out of /backups/dumps/production, via |
| − | *: <code>./worker</code> | + | *:<code>/bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production</code> |
| − | * | + | |
| − | * | + | Worker nodes: |
| − | *: <code> | + | |
| + | <!-- * snapshot1 -- currently running 3 worker processes for bigger wikis out of /backups/dumps/production, via | ||
| + | *:<code>./worker --log --configfile confs/wikidump.conf.bigwikis --basedir /backups/dumps/production</code> --> | ||
| + | * snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via | ||
| + | *:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwiki</code> | ||
| + | *:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki</code> | ||
| + | * snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via | ||
| + | *: <code> /bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production</code> | ||
| + | * snapshot3 -- runs adds/changes dumps from cron as user backup | ||
| + | * snapshot4 -- running en wiki dumps out of /backups/dumps/production, via | ||
| + | *: <code>./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki</code> | ||
| + | |||
| + | ===Other tasks=== | ||
| + | |||
| + | * snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables | ||
| + | * snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations | ||
| + | * snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour | ||
| + | |||
| + | |||
| + | [[Category:Dumps]] | ||
Latest revision as of 06:00, 2 July 2012
Contents |
[edit] Snapshot (XML dumps generation) cluster information
[edit] Hardware
These hosts generate the XML dumps. For information about the hosts that serve them, see Dumps/Dump servers.
We have two mini snapshot clusters.
In Tampa:
- snapshot1: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot2: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot3: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot4: operational, PowerEdge R815, Ubuntu 10.04, 8GB RAM, 4 8-core Opterons, 2 80GB HDs
In D.C.:
- snapshot1001: base install done, PowerEdge R815, Ubuntu 10.04, 64GB RAM, 4 8-core Opterons, 2 80GB HDs
- snapshot1002: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
- snapshot1003: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
- snapshot1004: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
Ordinarily only one cluster will be running dump jobs at a time; the other is on standby in case of various failures. The two beefier servers (with 4 8-core cpus) are dedicated machines for the en wikipedia dumps; as with the other hosts, one of them is in operation and the other is in standby.
[edit] Currently running
Monitors:
- snapshot1 -- current monitor node, runing out of /backups/dumps/production, via
/bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production
Worker nodes:
- snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via
python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwikipython ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki
- snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via
-
/bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production
-
- snapshot3 -- runs adds/changes dumps from cron as user backup
- snapshot4 -- running en wiki dumps out of /backups/dumps/production, via
-
./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki
-
[edit] Other tasks
- snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables
- snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations
- snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour