Upload.wikimedia.org

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Backend: scaling)
(apparently no ipv6 as of 2011-03-27)
 
(15 intermediate revisions by 5 users not shown)
Line 2: Line 2:
  
 
This provides separate physical hosting for efficiency, a separate domain name to aid in browser connection splitting, and a separate domain name to aid in JavaScript security rules.
 
This provides separate physical hosting for efficiency, a separate domain name to aid in browser connection splitting, and a separate domain name to aid in JavaScript security rules.
 +
 +
== DNS ==
 +
upload.wikimedia.org is handled by the [[PowerDNS]] geobackend, which CNAMEs it to upload.pmtpa.wikimedia.org, upload.knams.wikimedia.org or upload.yaseo.wikimedia.org, depending on the location of the querying resolver.
  
 
== Caching layer ==
 
== Caching layer ==
Line 14: Line 17:
 
!purpose
 
!purpose
 
|-
 
|-
|[[amane]]
+
|[[ms1]]
|upload3
+
|/mnt/upload5
|Most primary image dirs...
+
|uploaded images and texvc-rendered images
 
|-
 
|-
|[[storage1]]
+
|[[ms4]]
|upload4
+
|/mnt/thumbs
|Some thumbs and... maybe some primary image dirs?
+
|thumbnails
|-
+
|[[amane]]
+
|math
+
|texvc-rendered images
+
 
|}
 
|}
  
Some horrible snipped from squid conf:
+
Some not-too-horrible stanza snipped from upload-settings.php:
 
<pre>
 
<pre>
 
'apaches' => array(
 
'apaches' => array(
        'pmtpa' => array(
+
'pmtpa' => array(
                'amane.pmtpa.wmnet',
+
'ms1.wikimedia.org',
                '=de_wiki'      => 'storage1.wikimedia.org',
+
                 '=ms4_thumbs'=> 'ms4.wikimedia.org',
                '=en_thumbs'    => 'storage1.wikimedia.org',
+
),
                '=commons_thumbs' => 'storage1.wikimedia.org',
+
                '=wikisource_images' => 'storage1.wikimedia.org',
+
                #'=en_thumbs' => 'anthony.wikimedia.org',
+
                 #'=de_thumbs' => 'srv6.wikimedia.org',
+
        ),
+
 
),
 
),
 
</pre>
 
</pre>
 +
 +
Thumbnails are served by [[ms4]].
  
 
== Backend: scaling ==
 
== Backend: scaling ==
  
Wikimedia's wikis are configured to defer most image scaling/rasterization operations (<tt>$wgGenerateThumbnailOnParse</tt> off), just putting "optimistic" URLs <img> tags and letting a 404 handler on the upload servers deal with making sure the images are in place.
+
Wikimedia's wikis are configured to defer most image scaling/rasterization operations (<tt>[[mw:Manual:$wgGenerateThumbnailOnParse|$wgGenerateThumbnailOnParse]]</tt> off), just putting "optimistic" URLs <img> tags and letting a [http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/ 404 handler on the thumbnail server] deal with making sure the images are in place.
  
 
This does a couple nice things:
 
This does a couple nice things:
Line 50: Line 46:
 
* Encapsulation of thumbnailing/scaling problems on the "image scaling cluster" so the entire grid doesn't go down due to runaway 'convert' processes etc.
 
* Encapsulation of thumbnailing/scaling problems on the "image scaling cluster" so the entire grid doesn't go down due to runaway 'convert' processes etc.
  
The actual image servers ([[amane]] and [[storage1]]) farm out the scaling to the "image cluster" machines, a sub-cluster of apaches running MediaWiki just for <tt>thumb.php</tt>.
+
The thumbnail server ([[ms4]]) farms out the scaling to the "image scaling cluster" machines, a sub-cluster of apaches running MediaWiki just for <tt>[[mw:Manual:Thumb.php|thumb.php]]</tt>.
  
 
These backend scalers produce the rendered images and save them back to NFS, and they also get served back out to the requesting user-agent none the wiser.
 
These backend scalers produce the rendered images and save them back to NFS, and they also get served back out to the requesting user-agent none the wiser.
 +
  
 
Note that thumb.php can be accessed directly from the web as well on the regular "text" Apaches. Ideally we shouldn't do that? :)
 
Note that thumb.php can be accessed directly from the web as well on the regular "text" Apaches. Ideally we shouldn't do that? :)
Line 61: Line 58:
  
 
== Compatibility links ==
 
== Compatibility links ==
 
+
Most of our wikis have a rewrite rule to redirect requests from <tt>/upload</tt> and <tt>/math</tt> on the primary wiki domain to the appropriate subdirectory on <tt>upload.wikimedia.org</tt>. This provides compatbility for old direct image links from the days before the separate image hosting.
Most of our wikis have an rewrite rule to redirect requests from <tt>/upload</tt> and <tt>/math</tt> on the primary wiki domain to the appropriate subdirectory on <tt>upload.wikimedia.org</tt>. This provides compatbility for old direct image links from the days before the separate image hosting.
+
  
 
Special:Filepath can also send redirects here.
 
Special:Filepath can also send redirects here.
  
 
[[Category:Services]]
 
[[Category:Services]]
 +
[[Category:Image handling]]

Latest revision as of 21:26, 27 March 2011

Uploaded image and media files for Wikimedia's wikis appear on the separate subdomain upload.wikimedia.org.

This provides separate physical hosting for efficiency, a separate domain name to aid in browser connection splitting, and a separate domain name to aid in JavaScript security rules.

Contents

[edit] DNS

upload.wikimedia.org is handled by the PowerDNS geobackend, which CNAMEs it to upload.pmtpa.wikimedia.org, upload.knams.wikimedia.org or upload.yaseo.wikimedia.org, depending on the location of the querying resolver.

[edit] Caching layer

Uploads have a separate set of squid proxy caches from the text squids; this avoids contention between the two data sets, which have different characteristics for object size, update rate, etc.

[edit] Backend: storage

server path purpose
ms1 /mnt/upload5 uploaded images and texvc-rendered images
ms4 /mnt/thumbs thumbnails

Some not-too-horrible stanza snipped from upload-settings.php:

'apaches' => array(
	'pmtpa' => array(
		'ms1.wikimedia.org',
                '=ms4_thumbs'=> 'ms4.wikimedia.org',
	),
),

Thumbnails are served by ms4.

[edit] Backend: scaling

Wikimedia's wikis are configured to defer most image scaling/rasterization operations ($wgGenerateThumbnailOnParse off), just putting "optimistic" URLs <img> tags and letting a 404 handler on the thumbnail server deal with making sure the images are in place.

This does a couple nice things:

  • Less interaction between primary servers and NFS
  • Encapsulation of thumbnailing/scaling problems on the "image scaling cluster" so the entire grid doesn't go down due to runaway 'convert' processes etc.

The thumbnail server (ms4) farms out the scaling to the "image scaling cluster" machines, a sub-cluster of apaches running MediaWiki just for thumb.php.

These backend scalers produce the rendered images and save them back to NFS, and they also get served back out to the requesting user-agent none the wiser.


Note that thumb.php can be accessed directly from the web as well on the regular "text" Apaches. Ideally we shouldn't do that? :)

[edit] Private wikis

Our private wikis do not serve their files to the public through this interface; they're served through img_auth.php on the local domain, which enforced authentication.

[edit] Compatibility links

Most of our wikis have a rewrite rule to redirect requests from /upload and /math on the primary wiki domain to the appropriate subdirectory on upload.wikimedia.org. This provides compatbility for old direct image links from the days before the separate image hosting.

Special:Filepath can also send redirects here.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox