Dumps/Mirror status
ArielGlenn (Talk | contribs) |
(Reorganising stuff) |
||
| Line 1: | Line 1: | ||
We are interested in [[metawikipedia:Mirroring Wikimedia project XML dumps|mirroring]] of the dumps; please add information there if you can host or know of an organization that can. | We are interested in [[metawikipedia:Mirroring Wikimedia project XML dumps|mirroring]] of the dumps; please add information there if you can host or know of an organization that can. | ||
| − | + | == Current mirrors == | |
| + | # Brazillian mirror at C3SL | ||
| + | #: HTTP: http://wikipedia.c3sl.ufpr.br/, FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/, rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/. | ||
| + | :''We are currently talking with a couple of other sites'' | ||
| − | + | == In progress == | |
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | In progress | + | |
* host being set up at wansecurity.com -- waiting for network debugging between wmf and he.net, in progress | * host being set up at wansecurity.com -- waiting for network debugging between wmf and he.net, in progress | ||
| + | * host set up at your.org -- waiting for network debugging between wmf and he.net, in progress | ||
* code for upload of historical dumps to archive.org | * code for upload of historical dumps to archive.org | ||
* pinged someone at dattobackup.com | * pinged someone at dattobackup.com | ||
* checking contacts at amazon re: [[Amazon Public Data Sets]] which has been defunct for some time | * checking contacts at amazon re: [[Amazon Public Data Sets]] which has been defunct for some time | ||
* checked with Nemo_bis about GARR, need to work with them about their legal concerns | * checked with Nemo_bis about GARR, need to work with them about their legal concerns | ||
| − | |||
* SJ looking into contacts at MIT | * SJ looking into contacts at MIT | ||
| + | |||
| + | == Mirror requirements == | ||
| + | Our mirroring setup expects the other end to use rsync; we have a script which generates the list of files of the last 5 successfully completed dumps for each project on a daily basis, and this list is available to the mirror sites for use by rsync. Script [http://svn.wikimedia.org/viewvc/mediawiki/branches/ariel/xmldumps-backup/create-rsync-list.sh?view=markup here]. | ||
| + | |||
| + | Please also understand that the dumps occupy quite a huge amount of space (around 6T as of Jan 2012). | ||
| + | |||
| + | == Some other notes == | ||
| + | |||
| + | We have copied one complete run of our public XML files (about 1.3T?) off to [[Google storage]], which they have kindly donated to us. We are in the process of moving things around to comply with a better (non-usurpable by other Google storage users) naming scheme. We'd like to run a copy once every two weeks, keep the last five copies and then one copy permanently every six months. Script [http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/googlestorage/ here]. | ||
| + | |||
| + | Earlier mirror efforts are documented on the [[Offsite_Backups]] page. We need to see if any of these are still viable. Email sent to Kul, Milos to see if any of these possibilities are still live. | ||
| + | |||
| + | There are some dumps on the Wikimedia Toolserver, located at <tt>/mnt/user-store/dumps</tt>, for use by people who has Toolserver access. | ||
[[Category:Dumps]] | [[Category:Dumps]] | ||
Revision as of 10:28, 12 March 2012
We are interested in mirroring of the dumps; please add information there if you can host or know of an organization that can.
Contents |
Current mirrors
- Brazillian mirror at C3SL
- HTTP: http://wikipedia.c3sl.ufpr.br/, FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/, rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/.
- We are currently talking with a couple of other sites
In progress
- host being set up at wansecurity.com -- waiting for network debugging between wmf and he.net, in progress
- host set up at your.org -- waiting for network debugging between wmf and he.net, in progress
- code for upload of historical dumps to archive.org
- pinged someone at dattobackup.com
- checking contacts at amazon re: Amazon Public Data Sets which has been defunct for some time
- checked with Nemo_bis about GARR, need to work with them about their legal concerns
- SJ looking into contacts at MIT
Mirror requirements
Our mirroring setup expects the other end to use rsync; we have a script which generates the list of files of the last 5 successfully completed dumps for each project on a daily basis, and this list is available to the mirror sites for use by rsync. Script here.
Please also understand that the dumps occupy quite a huge amount of space (around 6T as of Jan 2012).
Some other notes
We have copied one complete run of our public XML files (about 1.3T?) off to Google storage, which they have kindly donated to us. We are in the process of moving things around to comply with a better (non-usurpable by other Google storage users) naming scheme. We'd like to run a copy once every two weeks, keep the last five copies and then one copy permanently every six months. Script here.
Earlier mirror efforts are documented on the Offsite_Backups page. We need to see if any of these are still viable. Email sent to Kul, Milos to see if any of these possibilities are still live.
There are some dumps on the Wikimedia Toolserver, located at /mnt/user-store/dumps, for use by people who has Toolserver access.