Swift/Open Issues Aug - Sept 2012/Cruft on ms7
ArielGlenn (Talk | contribs) |
(add snoop for HTTP) |
||
| Line 16: | Line 16: | ||
*check for nfs lookups via | *check for nfs lookups via | ||
*:<code> snoop -i /root/snoop-out.txt | grep NFS | grep LOOKUP | grep ' -> ms7' | more</code> | *:<code> snoop -i /root/snoop-out.txt | grep NFS | grep LOOKUP | grep ' -> ms7' | more</code> | ||
| + | |||
| + | You can also see the HTTP requests that hit the server (much easier): | ||
| + | *ssh over to ms7 as root | ||
| + | *:<code>snoop -c 1000 -d aggr1 port 80 |grep GET</code> | ||
==ms7 checklist== | ==ms7 checklist== | ||
Revision as of 15:47, 28 August 2012
Here's the todo list of things left on ms7 so we can knock 'em off one at a time.
You can see what things the web server is still referencing by:
- ssh over to ms7 as root
- cd into the /opt/local/share/ directory
- run the command
dtrace -qs ./access_log.d
You can see what files are being accessed via nfs but it's more work and there's a lot more traffic to sift through:
- ssh over to ms7 as root
- run the command (for maybe 20 seconds, half a minute)
snoop -o /root/snoop-out.txt ms7
- check the contents for nfs creations via
snoop -i /root/snoop-out.txt | grep NFS | grep CREATE | grep ' -> ms7' | more
- check for nfs lookups via
snoop -i /root/snoop-out.txt | grep NFS | grep LOOKUP | grep ' -> ms7' | more
You can also see the HTTP requests that hit the server (much easier):
- ssh over to ms7 as root
snoop -c 1000 -d aggr1 port 80 |grep GET
Contents |
ms7 checklist
ext-dist
Used by ExtensionDistributor. Where should source tarballs live anyways? MW tarballs live on dataset2, extensions on ms7, there's talk about nightly mw tarballs. Where do we want them all?
favicon.ico
I imagine this is served by bits these days. We only need this around for things like 404 errors served by ms7. After everything else is off of here this can go.
index.html
I wonder how anyone would ever see this now. Anyways when everythiing is moved off this file can go.
jars
This has old (2009) copies of cortado, used by Extension:OggHandler but this extension bundles its own copy for some time now, and that copy is of coursemuch more recent too. I *think* we can just move these out of the way.
And no. I see in CommonSettings.php that we reference it:
$wgCortadoJarFile = "$urlprotocol//upload.wikimedia.org/jars/cortado.jar";
- sigh*
lost-image-thumb-backup
We don't use this for anything; it looks like a copy of some files that were made when trying to restore media after fixing some bug. in 2008. Tim would know whether it's worth preserving a copy of this directory someplace on a backup or offline, in case we need to dig through it. We're talking about 13500 files for a total of around 320 MB.
math
Used by Extension:Math for rendering of mathematical formulas in articles. Aaron has commited code to move these files into Swift; see [1]. Like thumbs, they can be regenerated at any time from the original content. Do we want to consider periodic cleanups of them?
NOTE ALSO that the subdirectories /wikipedia/(langcode)/math are still in use; I see GETS to these urls and files being returned.
math.tmp
Scratch area for Extension:Math, still used. We don't want this cruft in Swift, where can it go? Can't temporary files live on the apaches that generate them, until they are cleaned out by cron or a reboot?
mime.php
Can't find any references to this anywhere, not on ms7 nor in the mw configs. It just contains a single (old) copy of the wfMimeType function. I hpe this means we can move it out of the way.
portal
This used to be used by the fundraiser with settings like these in CommonSettings.php:
- $wgFundraiserPortalDirectory = "/mnt/upload6/portal";
- $wgFundraiserPortalPath = "$urlprotocol//upload.wikimedia.org/portal";
We should check to see if it's still needed. Jeff?
I see references:
GET /portal/wikipedia/en/fundraiserportal.js 200 (http://test.prototype.wikimedia.org/wiki/Main_Page)
Yet another reason to get rid of prototype :-P
private
We need to verify that deleted/oversighted images and images on private wikis are stored and served from Swift and handled correctly. When that's done most things under here can go.
There are also subdirectories captcha and captcha2. In CommonSettings.php:
$wgCaptchaDirectory = '/mnt/upload6/private/captcha
That's used by ConfirmEdit's FancyCapture module. I don't see references to captcha2 anywhere. I hope we can find someone who knows about it.
The subdirectory ExtensionDistributor is also in use:
$wgExtDistWorkingCopy = '/mnt/upload6/private/ExtensionDistributor/mw-snapshot'
pybaltestfile.txt
Once everything's off of here, this can go. Note from Ben or Faidon on the etherpad:
Swift does have this at monitoring/pybal.txt; see lvs.pp for full URL but rewrite.py will probably have to be modified to serve this file
robots.txt
Once everything's off here, this can go. BTW we do encourage bots to crawl media; how does that work with Swift anyways?
scripts
symlink to sync-from-home.
skins
These used to be served from ms7 (see old CommonSettings.php):
$wgStyleSheetPath = 'http://upload.wikimedia.org/skins'
but for quite some time now they are served from the bits cluster:
$wgStyleSheetPath = "$urlprotocol//bits.wikimedia.org/static-$wmfVersionNumber/skins"
so it should be ok to move this out of the way.
And yet.... I see some requests like
GET /skins/monobook/main.css GET /skins/common/commonPrint.css
Not very many, and they all have 301's. I guess we oughta think about those. I have referers for those too.
sync-from-home
Some pretty old scripts live here. One copies things to ms4 from the days it was a thumbs server; another is a very old copy of thumb-handler.php. Most are no longer used since ms7 doesn't serve thumbs. However /export/upload/scripts/404.php is still in use, which handles 404 errors for everything else. (See /opt/webserver7/https-ms7/config/obj.conf) When all ms7 service is turned off, this can go away.
x1
A one line shell script of JeLuf's which appears to test timezone and date formatting. I expect it can go.
Anything Else
wikipedia/commons/scan
The ScanSet extension appears to use a subdirectory under wikipedia/commons. From CommonSettings.php:
$wgScanSetSettings = array( 'baseDirectory' => '/mnt/upload6/wikipedia/commons/scans', 'basePath' => "$urlprotocol//upload.wikimedia.org/wikipedia/commons/scans", );
But everything in there is from 2005 with Tim's name on it. We should ask him.
wikipedia/(langcode)/timeline
This is still being referenced by Extension:EasyTimeline. I don't know where its images should live.
originals??
A few originals are still being served by ms7. How is this possible? Maybe this is the "extra dot in the url" trick mentioned in an email. Needs fixed.
I see one image with a referer from an article. Here's that info:
GET /wikipedia/commons/d/d5/Apollo_11_Lunar_Module_Eagle_in_landing_configuration_in_lunar_orbit_from_the_Command_and_Service_Module_Columbia.jpg 200 (http://en.wikipedia.org/wiki/Apollo_11)
How did it get to ms7?
broken urls
We occasionally get things like
GET /wikipedia/en/0/
which should be rejected (by the squids I guess) so they never make it here.