Image scalers
ArielGlenn (Talk | contribs) |
ArielGlenn (Talk | contribs) |
||
| Line 1: | Line 1: | ||
Image scalers handle generation of thumbnails. | Image scalers handle generation of thumbnails. | ||
| + | |||
| + | == How thumbs are served and generated == | ||
| + | |||
| + | *The client's inital request for a thumbnail (example: http://upload.wikimedia.org/wikipedia/en/b/bc/Wiki.png) goes to [[upload.wikimedia.org]] ([[lvs]] server). | ||
| + | *This farms out the request to one of a number of front-end [[squid]]s. | ||
| + | *The front-end squid requests the data from a back-end squid. | ||
| + | *The back-end squid looks in its cache; if there's no hit, it requests the image from the thumbnail server (currently [[ms4]]). | ||
| + | *The thumbnail server looks for the file; if it is present it delivers it. If not, it invokes thumbnail-handler.pl. | ||
| + | *The perl script, after some basic filename sanity checks, sends a request to the appropriate apache (e.g. en.wikipedia.org) for thumb.pl with the filename and other optional parameters. | ||
| + | *The request to en.wikipedia.org (or other server) goes to a front-end squid. | ||
| + | *The front-end squid asks a back-end squid. | ||
| + | *The back-end squid checks its cache; if there's nothing there it sends the request on to rendering.pmtpa (lvs server). | ||
| + | *The lvs server sends the request on to one of the scalers. | ||
| + | *The scaler creates the thumbnail, saves it directly (the filesystem from the thumbnail server is nfs-mounted on the scalers), and returns the data. | ||
== Temp files == | == Temp files == | ||
Revision as of 20:18, 11 September 2009
Image scalers handle generation of thumbnails.
How thumbs are served and generated
- The client's inital request for a thumbnail (example: http://upload.wikimedia.org/wikipedia/en/b/bc/Wiki.png) goes to upload.wikimedia.org (lvs server).
- This farms out the request to one of a number of front-end squids.
- The front-end squid requests the data from a back-end squid.
- The back-end squid looks in its cache; if there's no hit, it requests the image from the thumbnail server (currently ms4).
- The thumbnail server looks for the file; if it is present it delivers it. If not, it invokes thumbnail-handler.pl.
- The perl script, after some basic filename sanity checks, sends a request to the appropriate apache (e.g. en.wikipedia.org) for thumb.pl with the filename and other optional parameters.
- The request to en.wikipedia.org (or other server) goes to a front-end squid.
- The front-end squid asks a back-end squid.
- The back-end squid checks its cache; if there's nothing there it sends the request on to rendering.pmtpa (lvs server).
- The lvs server sends the request on to one of the scalers.
- The scaler creates the thumbnail, saves it directly (the filesystem from the thumbnail server is nfs-mounted on the scalers), and returns the data.
Temp files
The scalers need to have the directory /a/magick-tmp on them (owned and writable by apache). This matches the $wgImageMagickTempDir setting in CommonSettings.php. If the setting is changed, the directory on these servers must be changed as well. Imagemagick temp files have the form magick-XXnnnnnn and the files get left around when conversion dies for some reason such as overruning reousrces allocated by ulimit.
Ghostscript (gs) is called by Imagemagick when it converts pdfs to thumbnails. It leaves its temp files in whatever directory is pointed to be $wgTmpDirectory (curently /tmp). Its files look like gs_nnnnnn.
Current ulimits shoot gs when it creates a scratch file bigger than 100mb. This is more often than you might think; a pdf of about 10 mb can easily cause gs to write more than that. And if someone does a search on commons that results in one of these files showing up in the search results, every time they look at that result list it will try to create the thumb. A few of these can add up quickly, and we only have 4GB free on these boxes on the root partition.
To deal with this, a cron job clears out the above files from these directories every 5 minutes, if they are over an hour old. As long as we don't accumulate too many of these huge gs_* files in an hour we will be ok. It's been set up in puppet. (See /etc/puppet/manifests/imagescaler.pp on sockpuppet.) If you change these directories in the above config files, you need to change them in these cron jobs as well.