Dumps/Archive.org

From Wikitech
Jump to: navigation, search

Archive.org refers to the Internet Archive, which is a library of stuff, mainly scanned books, but can contain almost anything that is of free content.

We are currently working on moving the public datasets to the Archive for preservation, although right now its mainly being handled by volunteers (specifically Hydriz and Nemo).

Archiving from Labs

There is a project on Wikimedia Labs called "Dumps" that is dedicated to running the archiving processes by volunteers. Currently, the datasets that are being archived are:

  1. Adds/Changes dumps (source) - Runs daily via an indirect cron (see cron.py).
  2. Incremental media tarballs (source) - Runs 7 (or any other number, as long as zuwiktionary is complete) days after initial appearance here.

Currently not running but is being planned:

  1. Main database dumps
  2. Full media tarballs (at least for those <10GB)
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox