Dumps/Mirror status
We are interested in mirroring of the dumps; please add information there if you can host or know of an organization that can.
We currently have one mirror (http://wikipedia.c3sl.ufpr.br/) and are talking with a couple other sites.
We have copied one complete run of our public XML files (about 1.3T?) off to Google storage, which they have kindly donated to us. We are in the process of moving things around to comply with a better (non-usurpable by other Google storage users) naming scheme. We'd like to run a copy once every two weeks, keep the last five copies and then one copy permanently every six months. Script here.
Our mirroring setup expects the other end to use rsync; we have a script which generates the list of files of the last 5 successfully completed dumps for each project on a daily basis, and this list is available to the mirror sites for use by rsync. Script here.
Earlier mirror efforts are documented on the Offsite_Backups page. We need to see if any of these are still viable. Email sent to Kul, Milos to see if any of these possibilities are still live.
In progress:
- host being set up at wansecurity.com -- waiting for network debugging between wmf and he.net, in progress
- code for upload of historical dumps to archive.org
- pinged someone at dattobackup.com
- checking contacts at amazon re: Amazon Public Data Sets which has been defunct for some time
- checked with Nemo_bis about unimi, need to work with them about their legal concerns
- host being set up at your.org -- waiting for network debugging between wmf and he.net, in progress
- SJ looking into contacts at MIT