Dumps/Dump servers

From Wikitech
< Dumps(Redirected from Download.wikimedia.org)
Jump to: navigation, search

Contents

XML Dump servers

Hardware

We have two hosts:

  • Dataset2 in Tampa, production:
    Hardware/OS: PowerEdge R410, Ubuntu 10.04, 2 MD1000 arrays, 16GB RAM, 4 6-core Xeon X5650 cpus
    Disks: 144GB on the internal HDs with raid 1, 48T on the arrays with two raid 6 partitions set up as one LVM volume
    Note that this host also serves other public datasets such as some POTY files, the pagecount stats, etc.
  • Dataset1001 in D.C., rsync/mirrors:
    Hardware/OS: PowerEdge R510, Ubuntu 10.04, 1 MD-something array, 16GB RAM, 1? quad-core Xeon E5640 cpus
    Disks: 24 2TB disks in 2 12-disk raid6 volumes; 120GB partition for the OS, 1GB for swap, the rest combined into one 38T LVM volume
    Currently doing initial rsync of data from dataset2

Services

The production host serves dump files and other public data sets to the public.

It relies on lighttpd. Sometimes this service dies for no good reason. To restart it,

/etc/init.d/lighttpd restart

Deploying a new host

You'll need to set up the raid arrays by hand. We typically have two arrays so set up two raid 6 arrays with LVM to make one giant volume, xfs.

Install in the usual way (add to puppet, copying a pre-existing production dataset host stanza, set up everything for PXE boot and go). You may or may not want to include the download mirror classes from puppet for the new host. If you replace the host that is the current download mirror, make sure you tweak the cron job that generates the mirror file list, see Dumps/Snapshot hosts#Other_tasks for that and other jobs you might need to check.

Space issues

If we run low on space, we can keep fewer rounds of XML dumps; see Dumps#Space for how to do that.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox