Upload.wikimedia.org
Uploaded image and media files for Wikimedia's wikis appear on the separate subdomain upload.wikimedia.org.
This provides separate physical hosting for efficiency, a separate domain name to aid in browser connection splitting, and a separate domain name to aid in JavaScript security rules.
Contents |
DNS
upload.wikimedia.org is handled by the PowerDNS geobackend, which CNAMEs it to upload.pmtpa.wikimedia.org, upload.knams.wikimedia.org or upload.yaseo.wikimedia.org, depending on the location of the querying resolver.
IPv6
IPv6 is partially supported, AAAA records are currently sent to a list of participating resolvers only. Enabling it universally would cause too much breakage for users with broken IPv6 connectivity; but by restricting it to opt-in participants who are interested in working with us and solving problems where necessary, there should be less pain.
This works using the PowerDNS pipebackend, running selective_answer.py. This looks up the source IP address of the resolver in a participants list, and sends an extra AAAA record or not, depending on the result. A corresponding TXT record also informs the user of the status.
Currently this is Europe (knams) region only, due to lack of sufficient IPv6 connectivity elsewhere!
The IPv6 address is served by an IPv6-to-IPv4 proxy (ha-proxy), which simply proxies the HTTP request to the IPv4 LVS cluster, including an X-Forwarded-For header. Unfortunately it does not support doing HTTP processing along with persistent connections.
Caching layer
Uploads have a separate set of squid proxy caches from the text squids; this avoids contention between the two data sets, which have different characteristics for object size, update rate, etc.
Backend: storage
| server | path | purpose |
|---|---|---|
| ms1 | upload | uploaded images, thumbs and texvc-rendered images |
Some horrible snipped from upload-settings.php:
'apaches' => array( 'pmtpa' => array( 'ms1.wikimedia.org', ), ),
Thumbs are being moved off to ms4; some directories have already been replaced with symlinks to the old locations on ms1, so that no code nor config files need to know about the change. When the remaining files are copied we'll switch the whole repository to ms4. (July 27 2009)
Backend: scaling
Wikimedia's wikis are configured to defer most image scaling/rasterization operations ($wgGenerateThumbnailOnParse off), just putting "optimistic" URLs <img> tags and letting a 404 handler on the upload servers deal with making sure the images are in place.
This does a couple nice things:
- Less interaction between primary servers and NFS
- Encapsulation of thumbnailing/scaling problems on the "image scaling cluster" so the entire grid doesn't go down due to runaway 'convert' processes etc.
The actual image server (ma1 farms out the scaling to the "image scaling cluster" machines, a sub-cluster of apaches running MediaWiki just for thumb.php.
These backend scalers produce the rendered images and save them back to NFS, and they also get served back out to the requesting user-agent none the wiser.
Note that thumb.php can be accessed directly from the web as well on the regular "text" Apaches. Ideally we shouldn't do that? :)
Private wikis
Our private wikis do not serve their files to the public through this interface; they're served through img_auth.php on the local domain, which enforced authentication.
Compatibility links
Most of our wikis have a rewrite rule to redirect requests from /upload and /math on the primary wiki domain to the appropriate subdirectory on upload.wikimedia.org. This provides compatbility for old direct image links from the days before the separate image hosting.
Special:Filepath can also send redirects here.