Dumps/Archive.org
From Wikitech
Archive.org refers to the Internet Archive, which is a library of stuff, mainly scanned books, but can contain almost anything that is of free content.
We are currently working on moving the public datasets to the Archive for preservation, although right now its mainly being handled by volunteers (specifically Hydriz and Nemo).
Archiving from Labs
There is a project on Wikimedia Labs called "Dumps" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:
- Adds/Changes dumps (source) - Runs daily via an indirect cron (see cron.py).
- Incremental media tarballs (source) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance here.
Currently not running but is being planned:
- Main database dumps
- Full media tarballs (at least for those <10GB)