Server admin log/Archive 5
From Wikitech
< Server admin log(Difference between revisions)
(→20 April: tidy extension installed on apaches, now active) |
(backups on benet) |
||
| Line 3: | Line 3: | ||
[http://65.59.189.201/www.bomis-total/www.bomis-total.html Total bandw.] | [http://wikimedia.org/stats/live/ Squid stats] <br> | [http://65.59.189.201/www.bomis-total/www.bomis-total.html Total bandw.] | [http://wikimedia.org/stats/live/ Squid stats] <br> | ||
[http://zwinger.wikimedia.org/ganglia/ Ganglia]: [http://zwinger.wikimedia.org/ganglia/?m=cpu_user&r=hour&s=descending&c=Internal+apaches&h=&sh=1&hc=4 A]|[http://zwinger.wikimedia.org/ganglia/?m=cpu_report&r=hour&s=descending&c=Squids&h=&sh=1&hc=4 S]</div> | [http://zwinger.wikimedia.org/ganglia/ Ganglia]: [http://zwinger.wikimedia.org/ganglia/?m=cpu_user&r=hour&s=descending&c=Internal+apaches&h=&sh=1&hc=4 A]|[http://zwinger.wikimedia.org/ganglia/?m=cpu_report&r=hour&s=descending&c=Squids&h=&sh=1&hc=4 S]</div> | ||
| + | |||
| + | == 21 April == | ||
| + | * 18:00 midom: started backup run on benet | ||
== 20 April == | == 20 April == | ||
Revision as of 18:10, 21 April 2005
21 April
- 18:00 midom: started backup run on benet
20 April
- 11:25 brion: tidy extension installed on apaches, now active. To go back to external, set $wgTidyInternal = false; or remove extension=tidy.so from php.ini and restart apaches
- 10:50 brion: added node groups fc3, freebsd, debian
- 10:06 brion: removed isidore and vincent from fc2-i386 node group, as they're running FreeBSD and Debian
- 10:00 brion: working on installing tidy extension for php...
- 03:00 brion: re-enabled search
19 April
- 16:50 Tim: Pope-related flash crowd, peaking at 2100/s. Apaches were hard hit by searches (about 50% of profile time) so I disabled them temporarily.
- 16:00 Tim: we were getting reports of gzuncompress errors in memcached-client.php, on every page view on en. I put in an error suppression operator and instead logged all such errors to /home/wikipedia/logs/memcached_errors, to determine which server was the problem. It turned out to be not a server but a key, enwiki:messages to be precise. Deleting it and letting it reload fixed the problem.
- 07:30 midom: sad notice, smellie down, memory or other hardware troubles, lots of segmentation faults and other signals before reboot, didn't come up after.
17 April
- 09:00 midom: fixed broken webster replication, caused by table bugs at database bugs
- 06:45 brion: fixed symlinked php.ini on srv2, srv3
- 00:00 midom: reformatted suda data area from xfs to ext2, brought into MySQL service for enwiki only
14 April
- 03:20 brion: eowiki lucene search live! others building...
- 02:45 brion: started lucene index builds for eowiki, ruwiki, dewiki
- 02:15 brion: lucene search live for meta
- 01:45 brion: restarted meta search build, as it was pulling from wrong db. whoops!
13 April
- 23:51 brion: noticed some spam coming in on bugzilla. hacked rel="nofollow" into comment processing, removed the comment, and disabled the account used to post it.
- 22:40 brion: starting lucene index builds for metawiki and some other wikipedias
- 00:08 brion: removed Apache-Midnight-Job from avicenna crontab
12 April
- 23:50 brion: vincent and avicenna are sharing LuceneSearch burden.
- 20:00 brion: Chad fixed vincent, which is now running lucene. Isidore lucene stopped, it's going to be squid soon. Will take over an apache for additional search capacity.
- 13:30 brion: lucene search turned on for en with slightly old index file, daemon running on isidore
- 10:30 brion: gcj on isidore seems horked; index rebuild is much too slow (eta 18 hours) so stopped it. uploading an index from home, and building mono for further testing.
- 10:00 midom: holbach restored.
- 08:55 holbach seems to be deadish
- 08:50 brion: started lucene index build on isidore
- 05:50 brion: vincent doesn't seem to be coming up again, will need to be kicked.
- 05:20 brion: upgrading vincent to 2.6 kernel hoping to resolve threading/memory issues w/ MWDaemon
- 02:10 brion: rebooting srv6 due to zombie squid eating port 80
11 April
- 23:05 kate: experimenting with making an en.wp image dump using trickle (cvs: /tools/trickle/)
- 08:00 midom: broken replication (by chineese scammer) on bacon, fixed by "use otrs; repair table article" - myisam tables are evil, aren't they?
10 April
- ~23:00: kate: upgraded squid to STABLE9+patches (see squid builds) + restarted all squids.
- mark: All squids are running with too few FDs (1024), and if noone replaces all daemons by the new one Kate just built, we may have a problem tomorrow during peak hours...
- 19:15 midom: srv7 is now in squid service
- 19:07 brion: MWDaemon's memory usage got high enough it started swapping. Hung connections ate up apaches and hung the site until it was restarted.
- 5:30 brion: lucene search server active for en.wikipedia.org, running on vincent.
9 April
- 15:45 midom: dropped thttpd (as it was using 32bit mmaps) on dumps in favor of lighttpd. It has superb performance, serves 3500hits/s under ab and served 70MB/s from benet in small reqs... Extreme recommendations for using lighttpd for image uploads.
- 10:15 brion: running lucene search indexer on vincent (pulling enwiki from benet).
- 05:25 brion: added additional is rcbots to #is.wikipedia for tionary/books/quote
8 April
- 16:00 midom: redirected http://download.wikimedia.org/ to benet, misses tomeraider and uploads...
- 13:00 Tim: switched to Mark's squid binary on the French squids
7 April
- Mark, Tim: implemented Multicast HTCP purging on all FL apaches/squids. French Squids still need a binary replacement.
6 April
- 21:44 mark: Put port gi0/26 on csw1-pmtpa into trunking mode: vlans 1-2 only, with vlan 2 being the native vlan, no LACP negotiation
- 11:30 midom: benet put into dump operation
- 10:55 brion: reinstalled PHP on zwinger and apaches, compiled with memory limit and mbstring options enabled. This was left out when upgrading to 4.3.11.
- 2:40 brion: added NetCabo proxies to trusted proxy list (inconveniently shared by Jorge and a Nazi vandal on pt.wikipedia.org)
4 April
- 15:30 jeluf: disbaled logging of upload.wikimedia.org
- 15:15 midom: yet another image server overload. rotated 30G upload.wikimedia logfile, could be fragmentation overhead.
- 12:00 midom: moved log_bin.0[0123]? (40G worth of binlogs) from ariel to khaldun/avicenna backup/arielbinlog, reclaimed some master disk space.
- Do we need those binlogs for anything?
- 07:48 Tim: Started memcached on browne, it was in the list but not running. Fixed startup scripts. Noticed that browne can't contact albert on 10/8, modified yum.conf accordingly.
3 April
- 18:25 midom: extended public IP address range (now: 12 addresses)
- 17:50 midom: srv5 joined service as squid.
1 April
- 22:30 midom: Enabled recentchanges-based watchlist hack. Servers go faaaast.
- 23:15 brion: set default block expiry to 1h on dewiki by request of various admins
Archives
- Server admin log/Archive 1
- Server admin log/Archive 2 (2004 Oct - 2004 Nov)
- Server admin log/Archive 3 (2004 Dec - 2005 Mar)