Dumps/Snapshot hosts
From Wikitech
< Dumps(Difference between revisions)
ArielGlenn (Talk | contribs) (→Currently running) |
ArielGlenn (Talk | contribs) (→Currently running) |
||
| (10 intermediate revisions by one user not shown) | |||
| Line 28: | Line 28: | ||
Monitors: | Monitors: | ||
| − | * snapshot1 -- current monitor node | + | * snapshot1 -- current monitor node, runing out of /backups/dumps/production, via |
| − | *:<code>/bin/bash ./monitor wikidump.conf.monitor</code> | + | *:<code>/bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production</code> |
Worker nodes: | Worker nodes: | ||
| − | * snapshot1 -- currently running 3 worker processes for bigger wikis out of /backups | + | <!-- * snapshot1 -- currently running 3 worker processes for bigger wikis out of /backups/dumps/production, via |
| − | *:<code> | + | *:<code>./worker --log --configfile confs/wikidump.conf.bigwikis --basedir /backups/dumps/production</code> --> |
| − | * | + | * snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via |
| − | *: <code>./worker</code> | + | *:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwiki</code> |
| − | *: | + | *:<code>python ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki</code> |
| − | *: <code> ./worker --log --configfile confs/wikidump.conf | + | * snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via |
| − | + | *: <code> /bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production</code> | |
* snapshot3 -- runs adds/changes dumps from cron as user backup | * snapshot3 -- runs adds/changes dumps from cron as user backup | ||
| − | * snapshot4 -- running en wiki dumps via | + | * snapshot4 -- running en wiki dumps out of /backups/dumps/production, via |
| − | *: <code> | + | *: <code>./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki</code> |
===Other tasks=== | ===Other tasks=== | ||
| − | * snapshot1 -- as user backup from cron, /backups | + | * snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables |
| − | * snapshot1 -- as user backup from cron, /backups | + | * snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations |
* snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour | * snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour | ||
[[Category:Dumps]] | [[Category:Dumps]] | ||
Latest revision as of 06:00, 2 July 2012
Contents |
[edit] Snapshot (XML dumps generation) cluster information
[edit] Hardware
These hosts generate the XML dumps. For information about the hosts that serve them, see Dumps/Dump servers.
We have two mini snapshot clusters.
In Tampa:
- snapshot1: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot2: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot3: operational, PowerEdge 1950, Ubuntu 10.04, 8GB RAM, 2 quad-core Xeons, 80GB HD
- snapshot4: operational, PowerEdge R815, Ubuntu 10.04, 8GB RAM, 4 8-core Opterons, 2 80GB HDs
In D.C.:
- snapshot1001: base install done, PowerEdge R815, Ubuntu 10.04, 64GB RAM, 4 8-core Opterons, 2 80GB HDs
- snapshot1002: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
- snapshot1003: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
- snapshot1004: base install done, PowerEdge R410, Ubuntu 10.04, 16GB RAM, 2 6-core Xeons, 500GB HD
Ordinarily only one cluster will be running dump jobs at a time; the other is on standby in case of various failures. The two beefier servers (with 4 8-core cpus) are dedicated machines for the en wikipedia dumps; as with the other hosts, one of them is in operation and the other is in standby.
[edit] Currently running
Monitors:
- snapshot1 -- current monitor node, runing out of /backups/dumps/production, via
/bin/bash ./monitor --configfile confs/wikidump.conf.monitor --basedir /backups/dumps/production
Worker nodes:
- snapshot1 -- currently rerunning the rest of jobs plwiki and ruwiki out of /backups/dumps/production, via
python ./worker.py --configfile confs/wikidump.conf.bigwikis --restart --job abstractsdump --date 20120618 plwikipython ./worker.py --configfile confs/wikidump.conf.bigwikis --date 20120617 --restartfrom --job metahistorybz2dump ruwiki
- snapshot2 -- running 4 processes for small wikis out of /backups/dumps/production, via
-
/bin/bash ./worker --log --configfile confs/wikidump.conf --basedir /backups/dumps/production
-
- snapshot3 -- runs adds/changes dumps from cron as user backup
- snapshot4 -- running en wiki dumps out of /backups/dumps/production, via
-
./worker --configfile confs/wikidump.conf.enwiki --basedir /backups/dumps/production --wiki enwiki
-
[edit] Other tasks
- snapshot1 -- as user backup from cron, /backups/cronjobs/dumpcentralauth.sh every two weeks to dump the central auth tables
- snapshot1 -- as user backup from cron, /backups/cornjobs/create-rsync-list.sh to generate list of XML dump files once a day to be mirrored by other organizations
- snapshot1 -- as user datasets from cron, /usr/local/bin/daily-pagestats-copy.sh to copy over pagecount data from locke to a publically accessible web dir once an hour