Server admin log/Archive 5

From Wikitech
Jump to: navigation, search

Template:Topnavbar

28 March 01:49 (UTC, but cached)

Total bandw. | Squid stats

Ganglia: A|S

21 April

  • 18:00 midom: started backup run on benet

20 April

  • 11:25 brion: tidy extension installed on apaches, now active. To go back to external, set $wgTidyInternal = false; or remove extension=tidy.so from php.ini and restart apaches
  • 10:50 brion: added node groups fc3, freebsd, debian
  • 10:06 brion: removed isidore and vincent from fc2-i386 node group, as they're running FreeBSD and Debian
  • 10:00 brion: working on installing tidy extension for php...
  • 03:00 brion: re-enabled search

19 April

  • 16:50 Tim: Pope-related flash crowd, peaking at 2100/s. Apaches were hard hit by searches (about 50% of profile time) so I disabled them temporarily.
  • 16:00 Tim: we were getting reports of gzuncompress errors in memcached-client.php, on every page view on en. I put in an error suppression operator and instead logged all such errors to /home/wikipedia/logs/memcached_errors, to determine which server was the problem. It turned out to be not a server but a key, enwiki:messages to be precise. Deleting it and letting it reload fixed the problem.
  • 07:30 midom: sad notice, smellie down, memory or other hardware troubles, lots of segmentation faults and other signals before reboot, didn't come up after.

17 April

  • 09:00 midom: fixed broken webster replication, caused by table bugs at database bugs
  • 06:45 brion: fixed symlinked php.ini on srv2, srv3
  • 00:00 midom: reformatted suda data area from xfs to ext2, brought into MySQL service for enwiki only

14 April

  • 03:20 brion: eowiki lucene search live! others building...
  • 02:45 brion: started lucene index builds for eowiki, ruwiki, dewiki
  • 02:15 brion: lucene search live for meta
  • 01:45 brion: restarted meta search build, as it was pulling from wrong db. whoops!

13 April

  • 23:51 brion: noticed some spam coming in on bugzilla. hacked rel="nofollow" into comment processing, removed the comment, and disabled the account used to post it.
  • 22:40 brion: starting lucene index builds for metawiki and some other wikipedias
  • 00:08 brion: removed Apache-Midnight-Job from avicenna crontab

12 April

  • 23:50 brion: vincent and avicenna are sharing LuceneSearch burden.
  • 20:00 brion: Chad fixed vincent, which is now running lucene. Isidore lucene stopped, it's going to be squid soon. Will take over an apache for additional search capacity.
  • 13:30 brion: lucene search turned on for en with slightly old index file, daemon running on isidore
  • 10:30 brion: gcj on isidore seems horked; index rebuild is much too slow (eta 18 hours) so stopped it. uploading an index from home, and building mono for further testing.
  • 10:00 midom: holbach restored.
  • 08:55 holbach seems to be deadish
  • 08:50 brion: started lucene index build on isidore
  • 05:50 brion: vincent doesn't seem to be coming up again, will need to be kicked.
  • 05:20 brion: upgrading vincent to 2.6 kernel hoping to resolve threading/memory issues w/ MWDaemon
  • 02:10 brion: rebooting srv6 due to zombie squid eating port 80

11 April

  • 23:05 kate: experimenting with making an en.wp image dump using trickle (cvs: /tools/trickle/)
  • 08:00 midom: broken replication (by chineese scammer) on bacon, fixed by "use otrs; repair table article" - myisam tables are evil, aren't they?

10 April

  • ~23:00: kate: upgraded squid to STABLE9+patches (see squid builds) + restarted all squids.
  • mark: All squids are running with too few FDs (1024), and if noone replaces all daemons by the new one Kate just built, we may have a problem tomorrow during peak hours...
  • 19:15 midom: srv7 is now in squid service
  • 19:07 brion: MWDaemon's memory usage got high enough it started swapping. Hung connections ate up apaches and hung the site until it was restarted.
  • 5:30 brion: lucene search server active for en.wikipedia.org, running on vincent.

9 April

  • 15:45 midom: dropped thttpd (as it was using 32bit mmaps) on dumps in favor of lighttpd. It has superb performance, serves 3500hits/s under ab and served 70MB/s from benet in small reqs... Extreme recommendations for using lighttpd for image uploads.
  • 10:15 brion: running lucene search indexer on vincent (pulling enwiki from benet).
  • 05:25 brion: added additional is rcbots to #is.wikipedia for tionary/books/quote

8 April

7 April

  • Mark, Tim: implemented Multicast HTCP purging on all FL apaches/squids. French Squids still need a binary replacement.

6 April

  • 21:44 mark: Put port gi0/26 on csw1-pmtpa into trunking mode: vlans 1-2 only, with vlan 2 being the native vlan, no LACP negotiation
  • 11:30 midom: benet put into dump operation
  • 10:55 brion: reinstalled PHP on zwinger and apaches, compiled with memory limit and mbstring options enabled. This was left out when upgrading to 4.3.11.
  • 2:40 brion: added NetCabo proxies to trusted proxy list (inconveniently shared by Jorge and a Nazi vandal on pt.wikipedia.org)

4 April

  • 15:30 jeluf: disbaled logging of upload.wikimedia.org
  • 15:15 midom: yet another image server overload. rotated 30G upload.wikimedia logfile, could be fragmentation overhead.
  • 12:00 midom: moved log_bin.0[0123]? (40G worth of binlogs) from ariel to khaldun/avicenna backup/arielbinlog, reclaimed some master disk space.
    • Do we need those binlogs for anything?
  • 07:48 Tim: Started memcached on browne, it was in the list but not running. Fixed startup scripts. Noticed that browne can't contact albert on 10/8, modified yum.conf accordingly.

3 April

  • 18:25 midom: extended public IP address range (now: 12 addresses)
  • 17:50 midom: srv5 joined service as squid.

1 April

  • 22:30 midom: Enabled recentchanges-based watchlist hack. Servers go faaaast.
  • 23:15 brion: set default block expiry to 1h on dewiki by request of various admins

Archives

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox