Dumps/Mirror status
We are interested in mirroring of the dumps; please add information there if you can host or know of an organization that can.
Contents |
Current mirrors
- Brazillian mirror at C3SL (last 5 dumps)
- HTTP: http://wikipedia.c3sl.ufpr.br/, FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/, rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/.
- Masaryk University (last 5 dumps)
- HTTP: http://ftp.fi.muni.cz/pub/wikimedia/ FTP: ftp://ftp.fi.muni.cz/pub/wikimedia/ rsync: rsync://ftp.fi.muni.cz/pub/wikimedia/
- Your.org (all public data)
- Dumps:
- HTTP: http://dumps.wikimedia.your.org/ FTP: ftp://ftpmirror.your.org/pub/wikimedia/dumps/ rsync: rsync://ftpmirror.your.org/wikimedia-dumps/
- Media:
- HTTP: http://ftpmirror.your.org/pub/wikimedia/images/ FTP: ftp://ftpmirror.your.org/pub/wikimedia/images/ rsync: rsync://ftpmirror.your.org/wikimedia-images/
- Dumps:
- We are currently talking with a couple of other sites
- Do see: http://dumps.wikimedia.org/mirrors.html
In progress
Active
- Host being set up at wansecurity.com -- waiting for network debugging between wmf and he.net, in progress...
- Historical mirror for media at Archive.org (See collection [1])
- Will be ready ~June 2012.
Semi-active
- Code for uploading of historical dumps to archive.org.
- Pinged someone at dattobackup.com
- Checking contacts at amazon re: Amazon Public Data Sets which has been defunct for some time
- Checked with Nemo_bis about GARR, need to work with them about their legal concerns
- SJ looking into contacts at MIT
Mirror requirements
Our mirroring setup expects the other end to use rsync; we have a script which generates the list of files of the last 5 successfully completed dumps for each project on a daily basis, and this list is available to the mirror sites for use by rsync. Script here.
Please also understand that the dumps occupy quite a huge amount of space (around 6T as of Jan 2012).
Some other notes
We have copied one complete run of our public XML files (about 1.3T?) off to Google storage, which they have kindly donated to us. We are in the process of moving things around to comply with a better (non-usurpable by other Google storage users) naming scheme. We'd like to run a copy once every two weeks, keep the last five copies and then one copy permanently every six months. Script here.
Earlier mirror efforts are documented on the Offsite_Backups page. We need to see if any of these are still viable. Email sent to Kul, Milos to see if any of these possibilities are still live.
There are some dumps on the Wikimedia Toolserver, located at /mnt/user-store/dumps, for use by people who has Toolserver access.