Server admin log/Archive 20

From Wikitech
< Server admin log(Difference between revisions)
Jump to: navigation, search
(asher synchronized wmf-config/db.php 'returning db33, 39, 46 to prod' (logmsgbot))
(February 3)
Line 1: Line 1:
 
== February 3 ==
 
== February 3 ==
 
* 22:43 logmsgbot: asher synchronized wmf-config/db.php  'returning db33, 39, 46 to prod'
 
* 22:43 logmsgbot: asher synchronized wmf-config/db.php  'returning db33, 39, 46 to prod'
* 22:39 binasher: db35 had an iblogfile size inconsistent with other s5 hosts. streaming a hotbacking of db1034 to db35
+
* 22:39 binasher: db35 had an iblogfile size inconsistent with other s5 hosts. streaming a hotbackup of db1034 to db35
 
* 22:23 binasher: rebooted db35, db39
 
* 22:23 binasher: rebooted db35, db39
 
* 21:55 logmsgbot: asher synchronized wmf-config/db.php  'pulling db35, 39, 46 for upgrades'
 
* 21:55 logmsgbot: asher synchronized wmf-config/db.php  'pulling db35, 39, 46 for upgrades'

Revision as of 22:43, 3 February 2012

February 3

  • 22:43 logmsgbot: asher synchronized wmf-config/db.php 'returning db33, 39, 46 to prod'
  • 22:39 binasher: db35 had an iblogfile size inconsistent with other s5 hosts. streaming a hotbackup of db1034 to db35
  • 22:23 binasher: rebooted db35, db39
  • 21:55 logmsgbot: asher synchronized wmf-config/db.php 'pulling db35, 39, 46 for upgrades'
  • 20:52 K4-713: updated production civicrm to r1295
  • 20:45 logmsgbot: asher synchronized wmf-config/db.php 'adding back dbs 13,18,25'
  • 20:32 binasher: upgraded mysql on dbs 13,18,25,33
  • 20:17 logmsgbot: reedy synchronized php-1.18/extensions/SpamBlacklist/SpamBlacklist_body.php 'r110682'
  • 19:23 logmsgbot: asher synchronized wmf-config/db.php 'pulling dbs 13,18,25,26 for upgrades'
  • 19:11 RobH: manutius installed and ready for use
  • 17:26 RobH: updated dns for manutius.mgmt
  • 17:15 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'touch'
  • 17:11 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'touch'
  • 16:08 RobH: db41 being reinstalled, appears down but logging to be safe
  • 15:20 mark: Around 14:50 UTC, removed the 3 remaining esams upload squids in the knsq8-15 range from the config. This made ms5 unhappy.
  • 15:13 logmsgbot: reedy synchronized wmf-config/db.php 'Add comment that db40 is parsercache'
  • 13:53 mutante: resetting stats on new wikis per bz 34184: updateArticleCount.php vepwiki --update; updateArticleCount.php pnbwiktionary --update
  • 13:42 mark: Disabled knsq1-15 in PyBal, preparing for decommissioning
  • 03:53 maplebed: moved all the individual puppet files out of place, stopped nagios, and re-ran puppet (at now minus 1.5hrs)
  • 02:24 logmsgbot: LocalisationUpdate completed (1.18) at Fri Feb 3 02:24:54 UTC 2012
  • 00:57 K4-713: re-enabled the donations queue consumer via Jenkins
  • 00:42 K4-713: updated production civicrm to r1293
  • 00:23 logmsgbot: asher synchronized wmf-config/db.php 'moving watchlist/recentchanges back to db12, returning db24 to s2'
  • 00:09 K4-713: Disabled donations queue consumption on aluminium

February 2

  • 23:51 K4-713: updated production civicrm to r1291
  • 23:44 binasher: db12 back up with lucid + current mysql
  • 23:32 binasher: rebooting db12
  • 23:08 logmsgbot: asher synchronized wmf-config/db.php 'pulling db12 from enwiki, temporarily moving watchlist/recentchanges to db54'
  • 23:02 pgehres: K4-713 synchronized production CiviCRM to r1288 on Aluminium
  • 22:59 binasher: db24 upgraded to lucid and current mysql build
  • 22:52 binasher: rebooted db24
  • 22:44 logmsgbot: reedy synchronized wmf-config/ 'Disable VariablePage completely'
  • 22:26 binasher: pulled db24 from s2, preparing to upgrade to lucid
  • 22:19 logmsgbot: asher synchronized wmf-config/db.php 'pulling db24 from s2 for upgrade'
  • 21:37 apergos: started rsync from dataset2 to dataset1001 in screen session as root on dataset1001
  • 21:07 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Drop FundraiserPortal config'
  • 21:07 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Drop FundraiserPortal config'
  • 21:06 RobH: dataset1001 is alive, mostly
  • 19:15 logmsgbot: asher synchronized wmf-config/db.php 'raising db55 weight'
  • 19:08 logmsgbot: asher synchronized wmf-config/db.php 'add db55 - new s5 slave'
  • 18:06 notpeter: doing initial run of puppet on cp1001-1020
  • 17:33 notpeter: reimaging cp1002 and imaging cp1001 and cp1003-1020
  • 16:06 cmjohnson1: disk 15 swap complete on db11
  • 16:05 cmjohnson1: replacing disk 15 on db11
  • 15:55 mark: Running apt-get update && apt-get dist-upgrade && reboot on lvs1
  • 15:40 mark: Running apt-get update && apt-get dist-upgrade && reboot on lvs2
  • 15:10 logmsgbot: reedy synchronized php-1.18/extensions/CodeReview/api/ 'r110574'
  • 14:21 logmsgbot: hashar synchronized php-1.18/includes/UserMailer.php 'work around bug 34158'
  • 14:19 logmsgbot: catrope synchronized php-1.18/extensions/LocalisationUpdate/LocalisationUpdate.class.php 'r110570'
  • 14:10 RoanKattouw: Finally fixed ownership of cache/l10n on scalers , sync-l10nupdate only throws the expected errors, no more perms errors on the scalers
  • 14:09 RoanKattouw: Scalers now have disk space available because php-1.17-test is gone
  • 13:59 logmsgbot: catrope synchronizing Wikimedia installation... : Deleted php-1.17-test on fenari, running scap to delete it on the Apaches as well
  • 13:49 RoanKattouw: Deleting /home/wikipedia/common/php-1.17-test , has been unused for a long time
  • 13:45 RoanKattouw: Deleting /tmp/mw-cache-1.17 on srv219 and srv223
  • 13:44 RoanKattouw: srv219-224 have a full disk according to rsync
  • 13:38 RoanKattouw: Fixing ownership of /usr/local/apache/common-local/php-1.18/cache/l10n on srv191, srv199, srv219-224
  • 13:35 RoanKattouw: Running sync-l10nupdate again to investigate rsync errosr
  • 13:34 logmsgbot: LocalisationUpdate completed (1.18) at Thu Feb 2 13:34:53 UTC 2012
  • 13:12 RoanKattouw: Running l10nupdate by hand to hopefully fix bug 33768
  • 13:11 logmsgbot: catrope synchronized php-1.18/extensions/LocalisationUpdate/LocalisationUpdate.class.php 'Deploy live-hacked version that will hopefully fix bug 33768'
  • 10:00 Tim: reinserted the deleted site_stats row for plwiki
  • 09:35 Tim: killing statistics queries on all s2 slave servers
  • 09:34 logmsgbot: tstarling synchronized php-1.18/includes/SiteStats.php 'disabling even more'
  • 09:32 Tim: on db54, killed all SiteStats queries
  • 09:28 Tim: restarting all apaches
  • 09:26 Tim: disabling SiteStatsInit::articles
  • 09:26 logmsgbot: tstarling synchronized php-1.18/includes/SiteStats.php
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Thu Feb 2 02:06:48 UTC 2012
  • 01:22 AaronSchulz: running "mwscriptwikiset purgeDeletedFiles.php all.dblist --starttime=20120126000000" in a screen on fenari

February 1

  • 23:20 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'Enabled swift thumbnail purge code'
  • 23:12 logmsgbot: aaron synchronized wmf-config/swift.php 'actually register the hook handler'
  • 22:43 logmsgbot: aaron synchronized php-1.18/includes/filerepo/LocalFile.php
  • 22:26 binasher: streaming a hotbackup of db35 to db55 (new s5 slave)
  • 22:21 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'fix timezone typo for mr'
  • 22:19 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Configure rights assignments for AFTv5'
  • 22:03 AaronSchulz: Enabled SwiftCloudFiles extension on all wikis, doesn't do anything yet
  • 21:51 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Asia/Kolata isn't valid'
  • 21:48 Jeff_Green: disabling deprecated apache, lighttpd, haproxy, squid, mysql services on loudon
  • 21:38 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Add additional articles category to $wgArticleFeedbackv5DashboardCategory'
  • 21:36 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'timezone config for new sties'
  • 21:30 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 34120 - Enable translate extension on Wikimania 2012 wiki'
  • 21:29 Reedy: Created Translate tables on wikimania2012wiki
  • 21:28 Jeff_Green: dist-upgrading and rebooting loudon
  • 21:20 logmsgbot: reedy ran sync-common-all
  • 21:09 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'fix timezone'
  • 21:07 logmsgbot: reedy ran sync-common-all
  • 20:55 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'vepwiki config'
  • 20:53 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Undo temp eot change'
  • 20:53 logmsgbot: reedy ran sync-common-all
  • 20:48 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'wrong file'
  • 20:46 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Temporarily enable eot uploads on amwiki'
  • 20:35 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bewikimedia site config'
  • 20:31 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Set $wgArticleFeedbackv5SelectedCTA = 1'
  • 20:29 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'set bewikimedia to en'
  • 20:21 logmsgbot: catrope synchronizing Wikimedia installation... : Deploying ArticleFeedbackv5 updates
  • 20:15 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 19:58 RobH: labs switch ports connected per rt 1882
  • 19:57 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 19:52 RobH: strontium.mgmt repaired per rt2352
  • 19:50 logmsgbot: catrope synchronized php-1.18/cache/interwiki.cdb 'rebuilt interwiki cache'
  • 19:47 logmsgbot: catrope synchronized php-1.18/cache/interwiki-pr.cdb 'rebuilt interwiki cache'
  • 19:47 logmsgbot: catrope synchronized php-1.18/cache/interwiki.cdb 'rebuilt interwiki cache'
  • 19:43 RobH: lab-ex4200-1 back in rack
  • 19:29 logmsgbot: reedy ran sync-common-all
  • 19:20 RobH: pushing apache changes for reedy
  • 18:27 RobH: ganglia1002 back online ready for install
  • 18:26 RobH: ganglia1002 mgmt offline per rt 2247, system was unplugged... no idea why
  • 18:24 cmjohnson1: pulled drive 2 db47
  • 18:15 RobH: cp1019 memory error repaired, now it is ready for OS install
  • 18:14 RobH: cp1017 memory error repaired
  • 17:54 RobH: updated dns for payments boxen renames in eqiad
  • 17:37 RobH: cp1014 memory was improperly installed (from factory?), installed in supported configuration and system is now ready for OS install per RT2351
  • 17:08 RobH: investigating errors on cp1014
  • 17:04 RobH: cp1019 console redirection fixed per rt2353, ready for OS install
  • 16:57 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 16:54 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 16:48 RobH: dataset1001 controller replaced
  • 16:41 cmjohnson1: reseating drive2 in db47
  • 16:37 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'wikilove default on fawikis'
  • 16:22 RobH: dataset1001 down for controller replacement
  • 16:07 mark: Removed now obsolete package wikimedia-task-squid from the karmic-wikimedia and lucid-wikimedia APT repositories, and deleted in svn.wikimedia.org
  • 15:45 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Try a quietened dumpInterwiki script'
  • 14:45 mutante: running authdns-update to remove oldusability
  • 14:30 mutante: shutting down "oldusability" linode instance
  • 13:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Point wgInterwikiCache at interwiki.cdb'
  • 13:41 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache copying protocol relative over interwiki.cdb'
  • 13:36 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Kill wmgHTTPSExperiment'
  • 13:35 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Tidy up cache epoch code'
  • 13:30 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Remove some of the wmgHTTPSExperiment related conditionals'
  • 13:27 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Only use the protocol relative interwiki cdb'
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Wed Feb 1 02:06:20 UTC 2012
  • 00:51 binasher: applied articlefeedback v5 schema changes to enwiki, testwiki, en_labswikimedia
  • 00:33 logmsgbot: reedy synchronized php/cache/interwiki-pr.cdb 'Updating interwiki cache'
  • 00:07 logmsgbot: reedy synchronized php/cache/interwiki-pr.cdb 'Updating interwiki cache'
  • 00:05 logmsgbot: reedy synchronized php/cache/interwiki-pr.cdb 'Updating interwiki cache'

January 31

  • 21:44 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/gateway_common/us-states.i18n.php 'r110433'
  • 21:44 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/gateway_common/interface.i18n.php 'r110433'
  • 21:43 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/gateway_common/countries.i18n.php 'r110433'
  • 21:41 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/globalcollect_gateway/globalcollect_gateway.i18n.php 'r110433'
  • 21:41 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/globalcollect_gateway/globalcollect_gateway.alias.php 'r110433'
  • 21:40 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.i18n.php 'r110433'
  • 21:39 logmsgbot: awjrichards synchronized php/extensions/DonationInterface/payflowpro_gateway/payflowpro_gateway.alias.php
  • 21:23 notpeter: restarting puppet on emery
  • 20:57 notpeter: temp stopping puppet on emery for testing
  • 19:58 notpeter: on stafford, that is
  • 19:57 notpeter: restarting puppetmaster proc as it's serving up 500s to all clients (well, 3 randomly selected ones...)
  • 18:55 Ryan_Lane: restarted squid and lighttpd on brewster
  • 18:36 RoanKattouw: IRC breakage postmortem: MediaWiki was configured to send UDP packets to .179 (ekrem-old) instead of .178 (ekrem)
  • 18:32 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Change wgRC2UDPAddress to the new ekrem IP'
  • 16:56 mutante: restarted ircd on ekrem once again because we still cant join channels .. problem remains
  • 16:35 mutante: restarted IRC bot on ekrem (needs dependency to start after ircd)
  • 16:30 mutante: ekrem - gets Error 500 on SERVER when running puppet
  • 16:30 logmsgbot: reedy synchronized php-1.18/extensions/SpamBlacklist/SpamBlacklist_body.php 'r110401'
  • 16:28 mutante: ekrem - su -c /usr/local/ircd-ratbox/bin/ircd irc
  • 16:21 mutante: powercycling ekrem - mgmt just showed "Stopping web" and was frozen completely
  • 16:17 RoanKattouw: ekrem suddenly died around 16:03 UTC, breaking the RC IRC feed
  • 15:06 mutante: changed nameservers for wikimedia.pl per RT:2277/bugzilla:33509
  • 09:28 logmsgbot: catrope synchronized php-1.18/includes/Wiki.php 'r110368'
  • 09:27 logmsgbot: catrope synchronized php-1.18/includes/Exception.php 'r110368'
  • 06:34 Tim: added myself to the gerrit "administrators" group
  • 05:23 Tim: the segfaults didn't stop, so I'm disabling wmerrors entirely for now
  • 05:13 Tim: since puppet is broken, disabled wmerrors backtrace logging by adding a separate configuration file in /etc/php5/conf.d and reloading apache
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Tue Jan 31 02:06:09 UTC 2012

January 30

  • 23:56 awjr: synchronized i18n files for DonationInterface on payments cluster to r110342
  • 23:46 Ryan_Lane: moving instances from virt2 to virt1 to rebalance compute cluster
  • 19:48 logmsgbot: awjrichards synchronizing Wikimedia installation... : Syncing CentralNotice to r110026 of trunk, includes important fix for 1.19 compatibility
  • 19:28 logmsgbot: asher synchronized wmf-config/db.php 'raising db54 weight'
  • 18:57 logmsgbot: asher synchronized wmf-config/db.php 'adding db54 to s2'
  • 18:39 mutante: running authdns-update to activate be.wikimedia.org
  • 18:19 logmsgbot: nikerabbit synchronized php-1.18/extensions/Narayam/resources/ext.narayam.rules.as.js 'I18ndeploy r110311 - bug 33924'
  • 18:17 logmsgbot: nikerabbit synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.fontlist.js 'I18ndeploy r110311 - bug 33599'
  • 18:16 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/ 'I18ndeploy r110310 - Translate help links'
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Mon Jan 30 02:06:42 UTC 2012

January 29

  • 18:55 logmsgbot: reedy synchronized php-1.18/extensions/SiteMatrix/SiteMatrix_body.php
  • 08:42 mutante: restarted lsearchd on search6
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 29 02:06:09 UTC 2012

January 28

  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Sat Jan 28 02:06:56 UTC 2012
  • 00:00 binasher: db54 is now replicating from db30

January 27

  • 21:57 pgehres: updates complete, re-enabling queue consumption on jenkins on aluminium
  • 21:42 pgehres: pausing payments queue consumption in jenkins to backup and then run some db updates
  • 21:34 LeslieCarr: applying loopback filter on cr1-eqiad
  • 21:28 RobH: dns update
  • 20:53 ^demon: gallium: clearing /tmp yet again. Aaron claims he's fixing it now
  • 19:37 RobH: reinstalling sq31
  • 18:32 RobH: dns update for fluorine host
  • 17:15 binasher: s2 dbs are a sad lot. streaming hotback of db1034 to db54 to build a new slave
  • 14:37 Jeff_Green: dist-upgrading storage3
  • 13:36 ^demon: gallium: cleaning up /tmp again, tests really need to clean up after themselves.
  • 11:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'th wikilogos'
  • 11:29 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'th wikilogos'
  • 11:18 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33862 - Request for logo change in Tamil Wikiquote'
  • 11:11 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33960 - Import sources for etwiki, etwikisource and etwiktionary'
  • 11:07 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33864 - Flood flag on sr.wiki'
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Fri Jan 27 02:05:24 UTC 2012
  • 00:17 logmsgbot: py synchronized wmf-config/CommonSettings.php 'changing eqiad cp1001-cp1020 IPs to their new, private IPs'

January 26

  • 23:58 logmsgbot: catrope synchronized php-1.18/extensions/MoodBar/ 'Update MoodBar'
  • 23:56 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Adding $wgMoodbarConfig["feedbackDashboardUrl"]'
  • 23:54 logmsgbot: reedy synchronized php-1.18/includes/specials/SpecialBlockList.php 'r110095'
  • 23:28 mark: Deployed squid configs to all squids
  • 23:26 mark: Deploying modified squid configs of modified squid config generator to text.knams
  • 22:14 RobH: poking at puppet change breaking things on sockpuppet puppet runs
  • 21:33 ^demon: gallium: cleared a bunch of junk from /tmp
  • 21:12 Jeff_Green: upgraded storage3 mysqld from 5.1.47 to mysql-at-facebook-r3753
  • 20:17 logmsgbot: asher synchronized wmf-config/db.php 'db37 back in s7'
  • 20:10 Reedy: Created "spoofuser" AntiSpoof table in the central auth database
  • 19:34 logmsgbot: asher synchronized wmf-config/db.php 'pulling db37 from s7 for upgrades'
  • 19:24 RobH: disregard any flapping by mw1001, its my script testbed
  • 18:04 RobH: forcing puppet run on srv199
  • 17:45 RobH: shutting down srv199 for bios tinkering by chris
  • 02:37 RoanKattouw: Started the udp2log process for the AFT logger manually on emery
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Thu Jan 26 02:06:22 UTC 2012
  • 01:01 logmsgbot: asher synchronized wmf-config/db.php 'adding db32 to s1 at low weight, new enwiki snapshot host'
  • 00:33 logmsgbot: asher synchronized wmf-config/db.php 'returning db18, now replicating heartbeat db'
  • 00:28 logmsgbot: asher synchronized wmf-config/db.php 'temporarily pulling db18'
  • 00:11 pgehres: re-enabling recurring donation module and processing in CiviCRM
  • 00:04 logmsgbot: asher synchronized wmf-config/db.php 'adding db26 to s7'
  • 00:03 RobH: shutting down srv151-srv186 per RT 2318 (confirmed not in pybal pools for apache or api)

January 25

  • 23:59 binasher: removed old external store apaches from pybal config
  • 23:50 logmsgbot: asher synchronized wmf-config/db.php 'pulling db26'
  • 23:49 logmsgbot: asher synchronized wmf-config/db.php 'adding db26 to s7'
  • 23:35 pgehres: re-enabled queue consumption for payments through Jenkins
  • 23:35 pgehres: awjr synchronized CiviCRM on aluminium to r1211
  • 23:05 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Checking if wmgDisplayFeedsInSidebar === false rather than true, since it defaults to true in the install file'
  • 22:50 logmsgbot: awjrichards synchronizing Wikimedia installation... : Enabling FeaturedFeeds everywhere
  • 22:41 RobH: updating dns for bellin/blondel db9/10 replacements
  • 22:32 logmsgbot: awjrichards synchronizing Wikimedia installation... : Dark-deploying FeaturedFeeds
  • 22:07 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'fixed spelling mistake fore FeaturedFeeds configuration'
  • 22:05 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Setting up FeaturedFeeds config; disabled by default'
  • 22:02 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Setting up FeaturedFeeds config; disabled by default'
  • 21:54 LeslieCarr: deactivated selected-paths policy-statement on cr1-eqiad and cr2-eqiad
  • 21:23 Tim: on srv197: compiled and installed a local version of wmerrors for segfault investigation
  • 21:22 binasher: bits caches: running varnish param.set thread_pool_min, thread_pool_max, where min = 15000 / cores / 4 and max = 15000 / cores
  • 20:57 binasher: running "varnishadm param.set thread_pool_max 1875" on mobile varnish servers
  • 20:05 Tim: on srv197: temporarily disabled puppet and enabled core dumps in apache2.conf for segfault flood investigation
  • 19:57 Jeff_Green: running dist-upgrade on payments* and silicon
  • 19:44 Tim: updating TrustedXFF host list using fenari
  • 19:29 RobH: dns update go!
  • 18:50 LeslieCarr: restarted varnish on niobium
  • 18:49 RoanKattouw: Restarted morebots
  • 13:33 logmsgbot: demon synchronized wmf-config/CommonSettings.php 'Change IP address for bnwiki account creation throttle per bug 33900'
  • 11:43 apergos: formey oom (I guess), unresponsive from mgmt console, powercycling.
  • 06:07 pgehres: disabled queue consumption of payments in jenkins until stuck message can be removed from queue
  • 05:49 pgehres: Disabling the processing of recurring payments in CiviCRM until we can add the appropriate payment_method to the queue msgs
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Wed Jan 25 02:06:17 UTC 2012
  • 00:57 binasher: streaming hotbackup of db53 to db32
  • 00:52 binasher: shutting down mysql on db32, going to reconfigure with lvm and reslave
  • 00:45 logmsgbot: asher synchronized wmf-config/db.php 'pulling db32 - this will be the new enwiki pmtpa snapshot host'
  • 00:30 binasher: streaming hotbackup of db37 to db26, preparing to reprovision db26 in s7

January 24

  • 23:30 pgehres: testing
  • 23:28 awjr: Syncing prod CiviCRM on aluminium to r1209
  • 22:39 mark: Disabled quota support on sanger's IMAP server to make Dovecot work again
  • 22:23 mark: Sanger is upgraded to lucid
  • 21:49 RobH: ms-be1 is online! MAN WE ARE AWESOME
  • 21:20 mark: Starting dist-upgrade of sanger
  • 20:50 binasher: pulled db26, rebooting and re-imaging with lucid
  • 20:48 logmsgbot: asher synchronized wmf-config/db.php 'pulling db26 from s1 to reimage'
  • 20:31 cmjohnson1: restarting ms4 for memory testing
  • 20:19 notpeter: spinning up db54-58 for asher
  • 20:09 logmsgbot: asher synchronized wmf-config/db.php 're-weighting s6 dbs'
  • 20:06 logmsgbot: asher synchronized wmf-config/db.php 'adding db43 back to s6 at a low weight'
  • 20:01 logmsgbot: asher synchronized wmf-config/db.php 'raising db53 weight to 400'
  • 19:37 logmsgbot: asher synchronized wmf-config/db.php 'raising db53 weight to 200'
  • 19:33 logmsgbot: asher synchronized wmf-config/db.php 'adding db53 as an enwiki slave at 1/4 normal weight'
  • 19:18 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 'switched to Preprocessor_Hash on ocwiki only'
  • 18:52 LeslieCarr: moved cp1001-1040 to private vlan
  • 18:52 LeslieCarr: restarted morebots
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Tue Jan 24 02:05:48 UTC 2012
  • 01:55 Ryan_Lane: fixed reverse dns for labs instances
  • 01:32 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Add account creation throttle increase for bug 33900'
  • 01:07 LeslieCarr: restarting dhcp3-server on brewster
  • 00:54 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 'new rsvg command line option'
  • 00:54 logmsgbot: tstarling synchronized wmf-config/CommonSettings.php
  • 00:50 Tim: upgraded rsvg on all mediawiki-installation servers, for some reason it is installed on all of them
  • 00:23 binasher: streaming a hotbackup of db1006 to db43
  • 00:19 Tim: running apt-get upgrade on image scalers
  • 00:18 Tim: uploaded new rsvg to apt.wikimedia.org, deploying to image scalers

January 23

  • 23:31 binasher: started slaving db53 from db36 (enwiki)
  • 23:16 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'update to Mobile Frontend for custom logo support'
  • 23:14 RobH: dns update for a bunch of things
  • 23:12 preilly: push config change for custom logos
  • 23:11 logmsgbot: preilly synchronized wmf-config/InitialiseSettings.php 'add MobileFrontend custom logo support'
  • 23:10 logmsgbot: preilly synchronized wmf-config/CommonSettings.php 'add MobileFrontend custom logo support'
  • 23:10 RobH: srv187, srv188, srv189 set to false in pybal for api lvs, old servers that will be decommed soon.
  • 21:26 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Wrapping more long lines'
  • 21:18 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33864 - Flood flag on sr.wiki'
  • 21:15 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33864 - Flood flag on sr.wiki'
  • 20:55 binasher: rebooting db1029 with proprietary binary only huawei kernel module installed, for short term ssd evaluation
  • 20:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33899 - Request for Narayam in outreach.wikimedia.org'
  • 20:39 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33841 - Re-point $wgLogo to on zhwiki'
  • 20:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33166 - Creation of a new namespace for Malagasy Wiktionary'
  • 19:59 RobH: added dns info for ms-be1 but not pushing change until leslie pushes her
  • 19:19 RobH: locke seems ok
  • 19:07 RobH: locke down
  • 19:06 RobH: going to shutdown locke now for the move
  • 18:05 logmsgbot: nikerabbit synchronized php-1.18/extensions/WebFonts/ 'i18ndeploy r109836'
  • 18:04 logmsgbot: nikerabbit synchronized php-1.18/extensions/CodeReview/ui/CodeRevisionView.php 'i18ndeploy r109836'
  • 17:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Push changes for aswiki by Santhosh'
  • 13:29 rainman-sr: restarted search1, search3, search4 - not sure why they were dead
  • 02:22 Ryan_Lane: installing new version of nginx in eqiad
  • 02:22 Ryan_Lane: restarted nginx in pmtpa and esams
  • 02:21 Ryan_Lane: installed new version of nginx in pmtpa and esams
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Mon Jan 23 02:06:35 UTC 2012

January 22

  • 23:46 Ryan_Lane: changing nginx config to use the escaped useragent
  • 23:45 Ryan_Lane: repooling ssl4
  • 23:39 Ryan_Lane: restarting nginx servers
  • 23:31 Tim: pushed out unstripped version of wmerrors
  • 23:26 Ryan_Lane: testing new nginx package on ssl4
  • 22:41 mark: cp1042 stuck on disk i/o, rebooting
  • 22:28 mark: Restarted varnish backend on cp1041 and cp1042
  • 21:38 preilly: remove SOPA banner
  • 21:38 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend to remove sopa banner'
  • 18:42 Tim: running apt-get upgrade on searchidx2
  • 18:39 Tim: running apt-get upgrade on snapshot2 and snapshot4
  • 18:32 Tim: running apt-get upgrade on snapshot1 to get wikimedia version of php-wikidiff2
  • 06:36 Ryan_Lane: repooling ssl1004, depooling ssl4
  • 05:44 Ryan_Lane: depooling ssl1004
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 22 02:05:22 UTC 2012
  • 00:51 Reedy: Resent mediawiki-cvs commit emails from r109549 through r109704

January 21

  • 21:45 logmsgbot: reedy synchronized php-1.18/includes/api/ApiParse.php 'r109695'
  • 19:19 logmsgbot: reedy synchronized wmf-config/flaggedrevs.php 'More for bug 29742'
  • 19:14 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33215 - Enabling transwiki import on sa.wiktionary+'
  • 18:55 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Setting wgNamespaceRobotPolicies for th projects'
  • 18:35 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Cleanup wgEnableDnsBlacklist, enable for th projects'
  • 18:33 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Setting timezone for th projects'
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Sat Jan 21 02:05:53 UTC 2012
  • 00:27 logmsgbot: asher synchronized wmf-config/db.php 'raising db52 load to 400'
  • 00:22 logmsgbot: asher synchronized wmf-config/db.php 'raising db52 load to 200'
  • 00:17 logmsgbot: asher synchronized wmf-config/db.php 'adding db52 to enwiki, load 100'

January 20

  • 23:45 LeslieCarr: ms6 sdc is undergoing fsck due to wrong fs type, bad option, bad superblock, or other on /dev/sdc1,
  • 23:44 binasher: moving north america bits back to eqiad
  • 23:34 binasher: moved bits eqiad to pmtpa (via scenarios/normal/bits-geo.wikimedia.org)
  • 23:32 LeslieCarr: killed carnish on niobium , cpu load seems to be going down
  • 23:30 LeslieCarr: reloading arsenic
  • 23:10 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Wrap some stupidly long lines'
  • 22:52 LeslieCarr: rebooting cp3001
  • 22:30 LeslieCarr: reloading cp3001
  • 22:24 LeslieCarr: restarting networking on cp3001
  • 22:03 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33789 - Enable botadmin usergroup on ml.wikipedia'
  • 22:00 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33789 - Enable botadmin usergroup on ml.wikipedia'
  • 21:58 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33789 - Enable botadmin usergroup on ml.wikipedia'
  • 20:40 RobH: dns servers all still online after update
  • 20:40 RobH: dns update for dataset1001
  • 18:33 LeslieCarr: knsq9 has recovered post-reboot
  • 18:21 LeslieCarr: knsq9 will be rebooted as it is dead, dead, dead
  • 17:48 LeslieCarr: knsq9 is dead/overloaded
  • 15:11 mutante: knsq30 still has bad disk, powering down again
  • 15:07 mutante: powercycling knsq30 after replacing cable
  • 15:04 ^demon: fixed post-commit hook on formey email notifs to point to correct smtp server
  • 14:41 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 14:27 logmsgbot: reedy synchronized php/cache/interwiki.cdb 'Updating interwiki cache'
  • 13:03 mutante: reinstalling srv199
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Fri Jan 20 02:05:21 UTC 2012
  • 01:17 awjr: updated representative/zipcode mapping and some contact info for a handful of reps/senators for CongressLookup r109598
  • 01:12 binasher: started another hotbackup of db38 to db52
  • 01:08 logmsgbot: asher synchronized wmf-config/db.php 'pulling db52'
  • 00:45 logmsgbot: asher synchronized wmf-config/db.php 'doubling db52 weight'
  • 00:38 logmsgbot: asher synchronized wmf-config/db.php 'lowering db52 weight'
  • 00:32 binasher: deployed new enwiki slave, db52
  • 00:32 logmsgbot: asher synchronized wmf-config/db.php 'setting db52 to full weight'
  • 00:19 logmsgbot: asher synchronized wmf-config/db.php 'adding new enwiki slave db52, with a low weight'
  • 00:08 preilly: push weekly mobile frontend update
  • 00:08 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'weekly update to Mobile Frontend'

January 19

  • 23:56 Ryan_Lane: changed global roles netadmins and sysadmins to be virtual static groups in ldap that autopopulate with any user that has objectclass=novauser
  • 23:15 Tim: rebuilt wikidiff2 with package name php-wikidiff2, removed lucid package php5-wikidiff2 from apt using "reprepro remove"
  • 22:52 Tim: recompiled wikidiff2 and put the new version up on apt.wikimedia.org
  • 21:51 Jeff_Green: starting conversion of fundraisingdb 'faulkner' tables from myisam to innodb, expect replication delays
  • 21:12 binasher: starting slaving db52 from db36, running hotbackup of db32 to db53
  • 20:36 RobH: dataset1001 shut down for later use
  • 20:27 RobH: dataset1001 mgmt online
  • 20:15 RobH: dataset1001.mgmt even
  • 20:15 RobH: updating dns for dataset1mgmt
  • 20:03 LeslieCarr: testing
  • 20:00 LeslieCarr: testing
  • 05:16 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'new sopa banner'
  • 05:14 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Enabling anon editing for enwiki'
  • 05:12 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Enabling page creation for users'
  • 05:08 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'new sopa banner'
  • 05:06 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'new sopa banner'
  • 05:00 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Removing all SOPA changes, excluding editing for anons, and page creation'
  • 04:57 binasher: flushing mobile varnish caches
  • 04:53 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'new sopa banner'
  • 04:47 Ryan_Lane: Preparing InitialiseSettings for renabling Wikipedia. DO NOT SCAP, DO NOT PUSH InitializeSettings
  • 04:32 logmsgbot: awjrichards synchronizing Wikimedia installation... : Deploying CongressLookup changes for the lifting of the blackout
  • 03:11 Ryan_Lane: bringing virt1 back up
  • 03:01 Ryan_Lane: rebooting virt1 to ensure hardware virtualization is enabled in the bios
  • 02:30 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109477'
  • 02:29 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/CongressLookup.i18n.php 'r109477'
  • 02:06 Ryan_Lane: rebalance of gluster volume completed
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Thu Jan 19 02:05:55 UTC 2012
  • 02:05 Ryan_Lane: rebalancing instance gluster volume. network may get saturated for a while.
  • 01:55 Ryan_Lane: added virt1 and virt4 to instance volume for gluster
  • 01:17 Reedy: Leaving cleanupUploadStash.php running against commonswiki in a screen session as me on hume
  • 01:16 binasher: removing extra mobile varnish capacity - it wasn't needed
  • 01:13 awjr: updated zip code/representative data on enwiki to r109465
  • 01:01 Ryan_Lane: installed python-argparse on stat1
  • 00:54 binasher: running a hot backup of db32, streaming to db52
  • 00:22 Ryan_Lane: removing virt1 cname
  • 00:21 Ryan_Lane: rebuilding virt1 as a nova compute node
  • 00:20 LeslieCarr: changed vlan for virt1 eth0
  • 00:18 Ryan_Lane: cleared lighttpd logs on brewster and restarted squid and lighttpd
  • 00:05 logmsgbot: asher synchronized wmf-config/db.php 'returning db32 to normal weight'

January 18

  • 23:59 logmsgbot: asher synchronized wmf-config/db.php 'returning db32 at a low weight'
  • 23:50 binasher: rebooting db32 for mysql/kernel upgrades
  • 23:49 logmsgbot: asher synchronized wmf-config/db.php 'pulling db32 from s1 for mysql/kernel upgrades'
  • 23:44 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109457'
  • 23:02 maplebed: increased the size of db11's logical volume for /a from 500G to 800G.
  • 22:27 binasher: enwiki master changed to db36 - MASTER_LOG_FILE='db36-bin.000599', MASTER_LOG_POS=15773827
  • 22:26 logmsgbot: asher synchronized wmf-config/db.php 'done swapping s1 master to db36'
  • 22:25 binasher: swapping s1 master to db36
  • 22:24 logmsgbot: asher synchronized wmf-config/db.php 'starting swap of s1 master to db36, s1 in read-only'
  • 22:13 logmsgbot: asher synchronized wmf-config/db.php 'returning db36 to normal weight'
  • 22:07 logmsgbot: asher synchronized wmf-config/db.php 'returning db36 at a low weight'
  • 21:59 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109440'
  • 21:58 binasher: rebooting db36, upgrading kernel + mysql
  • 21:56 logmsgbot: asher synchronized wmf-config/db.php 'pulling db36 from s1 for mysql/kernel upgrades'
  • 21:54 Ryan_Lane: installing python-wurfl on stat1
  • 21:35 Ryan_Lane: installing geoip-bin geoip-database libgeoip1 python-geoip on stat1
  • 21:13 logmsgbot: asher synchronized wmf-config/db.php 'returning db38 at prior weight'
  • 21:05 Reedy: Run patch-ug_group-length-increase.sql on all wikis
  • 21:04 Reedy: Run patch-uploadstash_chunk.sql on all wikis
  • 21:03 Reedy: Run patch-jobs-add-timestamp.sql on all wikis
  • 20:55 awjr: update cl_zip5 table for CongressLookup to data in r 109408
  • 20:43 Reedy: Manually running cleanupUploadStash.php against commonswiki
  • 20:42 Reedy: Manually ran cleanupUploadStash.php against enwiki
  • 20:31 binasher: db38 in service at a low weight with new lucid kernel and current mysql build
  • 20:30 RobH: shutting down db17, confirmed not in db rotation and has no mysql instance active
  • 20:30 logmsgbot: asher synchronized wmf-config/db.php 'returning db38 at a lower weight'
  • 20:28 logmsgbot: asher synchronized wmf-config/db.php 'pulling db38 again'
  • 20:26 logmsgbot: asher synchronized wmf-config/db.php 'returning db38 to service'
  • 20:17 LeslieCarr: rebooting spence as it's once again gone crazy
  • 20:11 binasher: pulled db38, rebooting for kernel and mysql upgrades
  • 20:11 logmsgbot: asher synchronized wmf-config/db.php 'pulling db38 from s1 for upgrade'
  • 20:04 RobH: mw1102 coming down for mainboard replacement
  • 20:03 LeslieCarr: killing puppet processes on spence
  • 19:28 Reedy: Run patch-jobs-add-timestamp.sql on enwiki (jobs table is empty!)
  • 19:01 mutante: mw1108 - OS installed, added to puppet, finished catalog run, free for use
  • 18:37 mutante: pxe booting mw1108, OS install
  • 18:36 mutante: fixed DHCP config for mw1108 on brewster, had the string "Failed to connect to 10.65.1.108." where the MAC address should have been.
  • 18:27 RobH: searchidx1001 memory replaced per rt 2208
  • 18:20 mutante: tried to PXE boot mw1108 but no DHCP offers received
  • 18:15 RobH: searchidx1001 memory being replaced
  • 18:14 LeslieCarr: re-preffing tele2 routes
  • 18:11 RobH: db1004 hard disk replaced per rt#2140, rebuilding
  • 17:40 LeslieCarr: Draining HE to perform maintenance on the physical port
  • 16:57 logmsgbot: reedy synchronized php-1.18/extensions/CongressLookup/SpecialCongressLookup.php 'r109395'
  • 14:45 mark: Changed service IP addresses of lists.wikimedia.org in DNS to US prefixes
  • 14:40 mark: Disabled hold_domains on sodium and lily
  • 14:28 mark: Setup lily to route lists.wikimedia.org mails to sodium
  • 14:21 mark: rsync complete. Running dpkg-reconfigure mailman on sodium
  • 13:43 logmsgbot: demon synchronized php-1.18/extensions/CongressLookup/SpecialCongressLookup.php 'r109362'
  • 13:38 mark: Started rsync of selected mailman directories under /var/lib/mailman from lily to sodium
  • 13:37 mark: Removed all test messages on the exim4 queue on sodium
  • 13:37 mark: Created /var LVM snapshot on lily
  • 13:35 mark: Stopped lighttpd on lily
  • 13:34 mark: Stopped mailman on lily and sodium
  • 13:30 mark: Set hold_domains = lists.wikimedia.org on lily, to hold new lists mails on the queue
  • 13:30 mark: Starting mailman migration
  • 13:03 mutante: restarted pdns on ns0
  • 08:49 logmsgbot: neilk synchronizing Wikimedia installation... :
  • 07:45 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109336'
  • 06:28 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109324'
  • 06:24 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109322'
  • 06:08 logmsgbot: awjrichards synchronized php/extensions/CongressLookup/SpecialCongressLookup.php 'r109319'
  • 05:32 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Disabling CentralNotice for simplewiki'
  • 05:25 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Disabling moodbar on enwiki'
  • 04:59 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Disabling all editing for enwiki for SOPA blackout'
  • 04:50 Ryan_Lane: queuing up changes for totally disabling edits. DO NOT SCAP! DO NOT SYNC InitialiseSettings!
  • 04:47 Ryan_Lane: Editing enwiki's MediaWiki:Robots.txt to disallow BannerController for SOPA blackout
  • 04:45 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php
  • 04:44 Ryan_Lane: Disabling anon editing and page creation by users on enwiki for SOPA blackout
  • 04:33 logmsgbot: neilk synchronized wmf-config/InitialiseSettings.php 'enable CongressLookup on enwiki'
  • 04:01 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ApplicationTemplate.php 'update version'
  • 03:56 logmsgbot: neilk synchronizing Wikimedia installation... : deploying CongressLookup (for i18n reasons, not deploying to enwiki)
  • 03:28 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Removing restriction of display title for SOPA landing pages'
  • 02:35 binasher: cp1039-40 are now in service for mobile wikipedia
  • 02:04 logmsgbot: LocalisationUpdate completed (1.18) at Wed Jan 18 02:04:57 UTC 2012
  • 01:35 RobH: cp1040 and cp1036 ready for use
  • 01:33 RobH: cp1037, cp1038, cp1039 os installed, varnish partitions mounted, and puppet run

January 17

  • 22:47 binasher: ram only varnish instance now running on marmontel in front of apache/wordpress
  • 22:07 Ryan_Lane: installing memcache on marmontel
  • 20:54 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33769 - Allow bureaucrats to remove sysop rights at Bashkir Wikipedia'
  • 20:13 mutante: en.planet updates were stuck. reason was corrupted cache causing "bsddb.db.DBPageNotFoundError" which broke update script. solution was to kill stuck updates, delete files in cache dir and run update manually
  • 19:59 logmsgbot: reedy synchronized php-1.18/includes/Feed.php 'r109197'
  • 19:36 Ryan_Lane: added Cite extension to labscosnole
  • 19:29 logmsgbot: reedy synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r109186'
  • 18:05 RobH: blog is instantly faster
  • 18:05 RobH: theme updated on blog along with settting limit back to 20 comments per page
  • 17:46 RobH: aware of blog slowdowns, work is being done
  • 17:35 mutante: also upgraded drac firmware on mw1081 & mw1099 (fixes mgmt console problem)
  • 16:45 mutante: upgrading drac firmware on mw1108
  • 15:54 RobH: db43 rebooting
  • 15:27 RobH: db7 shutting down for decom, not listed in db for any clusters, load .01
  • 11:05 logmsgbot: neilk synchronized wmf-config/ExtensionMessages-1.18.php 'added CongressLookup to ExtensionMessages-1.18 for i18n'
  • 11:04 logmsgbot: neilk synchronized wmf-config/extension-list 'added CongressLookup to extension-list for i18n'
  • 10:30 logmsgbot: neilk synchronizing Wikimedia installation... : deploying CongressLookup. We are not deploying to any live wiki, just test, but this is to make i18n work
  • 10:28 logmsgbot: neilk synchronized wmf-config/InitialiseSettings.php 'added CongressLookup to InitialiseSettings'
  • 10:25 logmsgbot: neilk synchronized wmf-config/CommonSettings.php 'added CongressLookup require'
  • 05:47 maplebed: marmontel has now replaced hooper as blog.wikimedia.org
  • 05:26 maplebed: installing the mysql client on marmontel to test connectivity to the DB
  • 05:16 Ryan_Lane: installing php-apc on marmontel
  • 04:52 RobH: another dns update for servermgmt
  • 04:18 Ryan_Lane: installing varnish on hooper
  • 02:29 Ryan_Lane1: that last message was in regards to hooper
  • 02:29 Ryan_Lane1: temporarily disabled puppet, since the apache configuration was manually modified
  • 02:06 logmsgbot: LocalisationUpdate completed (1.18) at Tue Jan 17 02:06:00 UTC 2012
  • 01:52 Ryan_Lane: installed w3 total cache in wordpress on hooper
  • 01:51 Ryan_Lane: installing tidy on hooper
  • 01:51 Ryan_Lane: installing php-apc on hooper
  • 01:31 Ryan_Lane: powercycling hooper
  • 00:44 neilk_: neilk just added config change to set caching for banners on testwiki to 0. Should have no effect anywhere else.
  • 00:39 logmsgbot: neilk synchronized wmf-config/CommonSettings.php

January 16

  • 23:59 logmsgbot: tstarling synchronized docroot/bits/robots.txt 'removed rule intended for the itwiki protest, left in accidentally'
  • 22:06 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Switching permissions back to normal on testwiki'
  • 22:02 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Testing disabling edits on testwiki'
  • 22:02 Ryan_Lane: testing disabling edits for testwiki
  • 21:26 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 33763 DoubleWiki on frwikiversity'
  • 20:33 Ryan_Lane: pushing floating address changes to virt0
  • 19:23 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32182 - enable articlefeedback extension on spanish wikipedia'
  • 19:14 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33759 - Change sitename for lb.wiktionary'
  • 18:48 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/ 'i18ndeploy'
  • 18:47 logmsgbot: nikerabbit synchronized php-1.18/extensions/WebFonts/ 'i18ndeploy'
  • 18:45 logmsgbot: nikerabbit synchronized php-1.18/extensions/Narayam/resources/ext.narayam.rules.as.js 'i19ndeploy'
  • 18:44 logmsgbot: nikerabbit synchronized php-1.18/skins/common/shared.css 'i18ndeploy'
  • 17:17 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33508 - Enable Rollback group on id.wiki'
  • 17:05 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Disable autoconfirmed reupload on incubatorwiki'
  • 17:00 Reedy: srv221 and srv222 are out of space on /
  • 17:00 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Disable autoconfirmed reupload on incubatorwiki'
  • 16:41 logmsgbot: reedy synchronized wmf-config/flaggedrevs.php 'Kill labs setting stuff'
  • 16:40 logmsgbot: reedy synchronized wmf-config/flaggedrevs.php 'Kill labs setting stuff'
  • 16:37 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33662 - Change project namespace for lb.wiktionary'
  • 16:35 Reedy: Ran namespaceDupes on fawikisource
  • 16:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33662 - Change project namespace for lb.wiktionary'
  • 16:28 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33708 - Add alias to fa wikisource'
  • 16:26 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33708 - Add alias to fa wikisource'
  • 16:06 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33758 - Arabic numerals on Arabic Wiktionary'
  • 15:55 cmjohnson1: shutting down srv199 for main board replacement
  • 15:09 mutante: torrus was broken (RT:2279) and did not start due to corrupted berkeleydb, used db4.8_recover, service started again
  • 12:38 mark: Running puppet on freshly installed sodium
  • 09:15 apergos: cleaned up /tmp on srv223... seems like cleanup once an hour by cron isn't often enough any more, scalers are doing too much work
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Mon Jan 16 02:05:05 UTC 2012

January 15

  • 02:45 logmsgbot: reedy synchronized php-1.18/extensions/CentralNotice/CentralNotice.db.php 'r108949'
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 15 02:05:26 UTC 2012

January 14

  • 23:37 Ryan_Lane: shutting down virt1
  • 20:29 Ryan_Lane: stopping opendj on virt1
  • 20:07 Ryan_Lane: stopping pdns on virt1
  • 02:15 Reedy: That was me testing something
  • 02:14 logmsgbot: LocalisationUpdate failed: SVN update of extensions failed
  • 02:04 logmsgbot: LocalisationUpdate completed (1.18) at Sat Jan 14 02:04:46 UTC 2012
  • 01:00 Ryan_Lane: brought pdns on virt1 back up
  • 00:53 Ryan_Lane: stopping pdns on virt1 again to test dns
  • 00:19 Ryan_Lane: powercycling formey

January 13

  • 22:40 Ryan_Lane: force running puppet on all instances
  • 22:40 Ryan_Lane: re-generated certificates for all instances
  • 22:40 Ryan_Lane: deleted all puppet certificates on all instances
  • 22:31 LeslieCarr: restarting ganglia1001
  • 21:40 Ryan_Lane: changing virt1 to be a cname of virt0
  • 21:17 Ryan_Lane: killed pdns on virt1
  • 20:40 Ryan_Lane: stopped pdns on virt1
  • 20:32 logmsgbot: reedy synchronized php-1.18/extensions/Contest/specials/ 'r108843'
  • 20:28 Ryan_Lane: changed NS records for wmflabs.org and wmflabs to point to virt0
  • 20:28 Ryan_Lane: changed recursor to point wmflabs domain to virt0
  • 20:16 K4-713: synchronized payments cluster to r108833
  • 19:30 Ryan_Lane: shutting down virt1 to ensure migration was completed
  • 19:12 LeslieCarr: restarting gmond on cp1043
  • 18:55 LeslieCarr: hard powercycling ms1002
  • 18:48 LeslieCarr: rebooting ms1002 due to kswapd 100% cpu bug https://bugs.launchpad.net/ubuntu/+bug/721896
  • 16:44 mutante: added alswiktionary & alswikibooks to closed.dblist
  • 16:43 logmsgbot: dzahn synchronized closed.dblist
  • 16:33 mutante: syncing InitialiseSettings.php after changing as wiki namespace per bug 33507
  • 16:33 logmsgbot: dzahn synchronized ./wmf-config/InitialiseSettings.php
  • 16:07 mutante: updated blog theme and installed a plugin per RT:2271
  • 14:34 mutante: srv191 - has now fresh OS, re-issued puppet certs, ran puppet, restart memcached, etc. - all back in monitoring
  • 12:59 mutante: PXE booting srv191, installing OS
  • 07:45 Ryan_Lane: disassociated and reassociated some floating IP addresses, to fix NAT issues. Some NAT rules went missing.
  • 07:43 Ryan_Lane: added a grant for mediawiki in the database to fix labsconsole mediawiki outage
  • 07:42 Ryan_Lane: fixed memcached port in mediawiki configuration on labsconsole to fix slowness issue
  • 03:29 Ryan_Lane: switching labsconsole.wikimedia.org address to point to virt0
  • 02:58 Ryan_Lane: dns server is up on virt0
  • 02:57 Ryan_Lane: switched active ldap server in labs to virt0, for nova itself. instances still need to be re-pointed
  • 02:56 Ryan_Lane: switched rabbitmq server in labs to virt0
  • 02:56 Ryan_Lane: switched mysql masters for labs to virt0
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Fri Jan 13 02:05:29 UTC 2012
  • 01:47 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 33469 - Enable rollback function for editor group kawiki'
  • 01:39 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 33507 for aswiki'
  • 01:09 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 33556 - ArticleFeedback settings on Chinese wikipedia'
  • 01:02 logmsgbot: reedy synchronized closed.dblist 'Closing en_labswikimedia, de_labswikimedia, liquidthreads_labswikimedia (resync)'
  • 01:02 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 33468 - Email notifications for eswikibooks'
  • 00:33 Reedy: That was only touch
  • 00:33 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php
  • 00:32 Reedy: srv223 is also out of diskspace
  • 00:29 Reedy: srv219 is out of diskspace
  • 00:28 logmsgbot: reedy synchronized closed.dblist 'Closing en_labswikimedia, de_labswikimedia, liquidthreads_labswikimedia'
  • 00:17 Ryan_Lane: stopping puppet on all virt nodes

January 12

  • 21:54 Ryan_Lane: relabeled port at virt0
  • 21:54 Ryan_Lane: moved new virt0 from squid vlan to public-services2
  • 21:43 Ryan_Lane: rebuilding mobile2 as virt0
  • 21:43 Ryan_Lane: Adding back mgmt info for mobile1, changing mobile2 to virt0
  • 21:11 Ryan_Lane: rebuilding mobile1 as virt0
  • 21:08 Ryan_Lane: renaming mobile1 to virt0
  • 20:54 binasher: installing percona-toolkit on few remaining hardy dbs
  • 20:26 cmjohnson1: shutting down srv178-189 for decommissioning
  • 20:14 binasher: granted the "process" priv to nagios@localhost on all production db clusters
  • 20:07 logmsgbot: reedy synchronized php-1.18/includes/specials/SpecialSearch.php 'r108751'
  • 20:07 LeslieCarr: reassigning ports on asw-b-sdtpa
  • 17:00 notpeter: stop sodium to do manual reinstall
  • 16:33 RobH: adjusting all power strip humidity sensor 2 (floor level) to 12% humidity, as the center rack has the proper levels, floor levels always are low in humidity.
  • 16:17 mutante: after a config change to nrpe_local.cfg and puppet applying the change, the service was not resrted but for some reason all nagios-nrpe-server caught SIGTERM. manually applying the same config change does not cause problems. that caused a Nagios outage until nrpe servers were started again (via dsh)
  • 16:04 mutante: starting nagios-nrpe-server on ALL via dsh to speed up nagios recovery
  • 15:33 mutante: starting nagios-nrpe-server on srv's via dsh
  • 02:04 logmsgbot: LocalisationUpdate completed (1.18) at Thu Jan 12 02:04:31 UTC 2012
  • 00:48 preilly: pushing quick fix for special random
  • 00:48 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend to fix random link'
  • 00:41 LeslieCarr: added ganglia1002 and ganglia1001 to dns

January 11

  • 23:18 RobH: searchidx1001 offline and powered down until replacement memory arrives (2012-01-13) rt 2208
  • 22:56 RobH: poking searchidx1001 for memory error
  • 22:45 RobH: mw1108 online and ready for install per rt2253
  • 22:42 RobH: mw1099 repaired, ready for os install per rt2252
  • 22:39 RobH: mw1081 ready for install rt2251
  • 22:32 RobH: no its not ;]
  • 22:16 Reedy: lists.wikimedia.org is down
  • 21:53 logmsgbot: reedy synchronized php-1.18/includes/api/ 'r108683'
  • 21:36 RobH: psw1-eqiad mgmt connected
  • 21:24 RobH: leslie is handling the ganglia not starting back up issue even though i caused it to die, yay me
  • 21:22 RobH: updated dns for neon/cobalt to ganglia1001/1002
  • 21:17 RobH: ganglia offline for a moment, sorry folks
  • 21:17 logmsgbot: catrope synchronizing Wikimedia installation... : Deploying MoodBar changes
  • 21:16 RobH: i just took nickel offline by mistake
  • 20:58 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Change shorturl preefix default'
  • 20:57 logmsgbot: reedy synchronized php-1.18/extensions/ShortUrl/ 'r108680'
  • 20:50 RoanKattouw: Applying MoodBar schema changes (index addition and column addition) on all wikis
  • 20:44 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/ 'Updating AFTv5 to trunk staet'
  • 20:14 Jeff_Green: adjusted firewall rules on payments* to restore ganglia reporting since we switched to nickel
  • 20:11 RoanKattouw: Created AFTv5 tables on testwiki
  • 20:09 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r108666'
  • 19:53 cmjohnson1: shutting down srv191 for new install
  • 19:52 cmjohnson1: replaced HDD srv191
  • 19:47 logmsgbot: catrope synchronized php-1.18/resources/startup.js 'touch'
  • 19:46 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable AFTv5 on testwiki'
  • 19:40 RobH: mw1103 hardware issues, disregard nagios flapping
  • 19:37 RobH: mw1102 offline due to bad mainboard until replacement arrives tomorrow or next
  • 19:30 RobH: working on mw1102, disregard flapping
  • 18:40 notpeter: running authdns-update on dobson to pick up new dns temps
  • 18:28 RobH: lvs1003 repaired, now needs install and setup. rt1549 and rt 2241
  • 18:26 notpeter: gracefulling apache on spence to deactivate nmis.w.o (abandoned install of nedi)
  • 16:23 mark: Started rsync of lily:/var/lib/mailman/archives to sodium (in a screen on sodium)
  • 15:49 mark: Started rsync of lily:/var/lib/mailman/data to sodium (in a screen on sodium)
  • 15:39 logmsgbot: reedy synchronized php-1.18/includes/ 'r108626'
  • 15:34 logmsgbot: reedy synchronized php-1.18/includes/ 'revert r108625'
  • 15:32 logmsgbot: reedy synchronized php-1.18/includes/ 'r108625'
  • 15:23 logmsgbot: reedy synchronized php-1.18/extensions/CodeReview/ 'r108623'
  • 15:22 logmsgbot: reedy synchronized php-1.18/extensions/ArticleFeedbackv5/ 'r108623'
  • 15:22 logmsgbot: reedy synchronized php-1.18/extensions/ApiSandbox/ 'r108623'
  • 15:20 logmsgbot: reedy synchronized php-1.18/resources/mediawiki.action/ 'r108622'
  • 15:19 logmsgbot: reedy synchronized php-1.18/includes/ 'r108622'
  • 14:11 mutante: nagios https now serves real SSL cert
  • 14:09 mutante: fixed Apache VirtualHost warnings on spence, NameVirtualHost *:443 in ports.conf, <VirtualHost *:443> in sites-available,..
  • 02:04 logmsgbot: LocalisationUpdate completed (1.18) at Wed Jan 11 02:04:39 UTC 2012
  • 01:42 preilly: pushing updates to Zero Rated Mobile Access extension
  • 01:41 logmsgbot: preilly synchronized php-1.18/extensions/ZeroRatedMobileAccess/ 'push updates to ZeroRatedMobileAccess extension'
  • 01:18 preilly: only activate Zero Rated Mobile Access Extension for test wiki
  • 00:38 logmsgbot: preilly synchronized wmf-config/InitialiseSettings.php 'add ZeroRatedMobileAccess extension only on test'
  • 00:34 preilly: pushing ZeroRatedMobileAccess extension to production
  • 00:34 logmsgbot: preilly synchronized wmf-config/InitialiseSettings.php 'add ZeroRatedMobileAccess extension'
  • 00:34 logmsgbot: preilly synchronized wmf-config/CommonSettings.php 'add ZeroRatedMobileAccess extension'
  • 00:33 logmsgbot: preilly synchronized wmf-config/extension-list 'add ZeroRatedMobileAccess extension'
  • 00:30 preilly: push ZeroRatedMobileAccess extension
  • 00:30 logmsgbot: preilly synchronized php-1.18/extensions/ZeroRatedMobileAccess/ 'initial push of ZeroRatedMobileAccess extension'
  • 00:23 LeslieCarr: applying new loopback filter to cr1-eqiad - higher risk of issues

January 10

  • 23:58 preilly: weekly mobile frontend push
  • 23:58 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'weekly update to mobile frontend'
  • 22:10 lcarr: broke ganglia redirect on nickel, fixing with next push
  • 19:45 LeslieCarr: stopping gmetad on spence and unmounting the tmpfs drive
  • 18:32 LeslieCarr: restarting gmetad on nickel
  • 11:39 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/MessageGroups.php 'Translate bugfix r108500'
  • 07:20 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/TranslateEditAddons.php 'r108497'
  • 02:05 logmsgbot: LocalisationUpdate completed (1.18) at Tue Jan 10 02:05:15 UTC 2012
  • 01:31 binasher: all varnish servers have been upgraded to 3.0.2
  • 00:38 RobH: db1004 pd8 set to offline per rt 2140, will place call to dell for replacement
  • 00:23 RobH: correction for typo, mw1102, not mw1002
  • 00:23 binasher: repooled cp3002
  • 00:23 RobH: mw1002 coming down for hw testing rt 1656
  • 00:19 binasher: depooling cp3002, upgrading varnish

January 9

  • 23:57 binasher: testing varnish 3.0.2 upgrade on cp3001 (bits)
  • 23:42 binasher: adding two new mobile cache servers running varnish 3.0.2 (cp104[12]) to the m.wiki eqiad vip
  • 22:26 LeslieCarr: ganglia moved to new nickel server
  • 21:36 LeslieCarr: changing gmond source for ganglia3-tip
  • 21:19 RobH: snapshot1001-1004 mgmt online
  • 21:05 RobH: updating dns with snapshot1001-1004 primary ip info
  • 20:56 RobH: updating dns with snapshot1001-1004 mgmt
  • 20:30 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r108470'
  • 20:29 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/api/ApiArticleFeedbackv5.php 'r108470'
  • 20:25 LeslieCarr: replacing the ops@ alias with the new ops list on mchenry as people keep forgetting to email the new list
  • 19:57 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/Translate.php 'Deploy r108469 - bugfix for Translate'
  • 19:43 RobH: torrus dead, kicking
  • 19:32 logmsgbot: nikerabbit synchronized wmf-config/CommonSettings.php 'Updating Translate config 2/2'
  • 19:32 logmsgbot: nikerabbit synchronized wmf-config/InitialiseSettings.php 'Updating Translate config 1/2'
  • 19:16 logmsgbot: nikerabbit synchronized php-1.18/includes/parser/Parser.php 'Deploying r108461'
  • 19:07 logmsgbot: catrope synchronized php-1.18/extensions/UploadWizard/resources/mw.UploadWizardLicenseInput.js 'r108459'
  • 18:55 logmsgbot: nikerabbit synchronized php-1.18/extensions/Translate/ 'Deploying translate r108451'
  • 18:54 logmsgbot: nikerabbit synchronized php-1.18/languages/messages/MessagesEn.php 'Updating messagesEn'
  • 18:42 logmsgbot: nikerabbit synchronized php-1.18/extensions/ParserFunctions/ParserFunctions.i18n.magic.php 'Deploying r108449'
  • 18:34 logmsgbot: nikerabbit synchronized php-1.18/extensions/WikimediaMessages/WikimediaGrammarForms.php 'Deploying r108433'
  • 18:30 logmsgbot: nikerabbit synchronized p/extensions/WebFonts/ 'Updating WebFonts r108447'
  • 18:21 logmsgbot: nikerabbit synchronized php-1.18/extensions/Narayam/ 'Syncing Narayam'
  • 18:18 Nikerabbit: running Narayam preference migration script
  • 17:30 jeremyb: the time is now 17:30:30 UTC
  • 17:20 RoanKattouw: Installing (!) NTP on wikitech
  • 17:16 jeremyb: the time is now 17:19:30 UTC
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Mon Jan 9 02:04:47 UTC 2012

January 8

  • 23:20 Reedy: For some reason cp1001-1042 weren't listed in CommonSettings.php XFF, but (at least) 1042 was in service, meaning edits were attributed to it
  • 23:18 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Add cp1001-cp1041'
  • 23:10 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Add cp1042 to XFF'
  • 21:47 rainman-sr: killed broken search indexer thread on searchidx1 (please note searchidx1 is no longer in use!), and restarted incremental indexing on searchidx2 which was somehow broken
  • 21:43 rainman-sr: someone started incremental updating on searchidx1 ??!!
  • 14:54 apergos: removed old puppet lockfile on brewster, ran by hand
  • 14:47 apergos: cleared out some very large squid logs on brewster, (basically all of them) plus lighty logs, disk was full. restarted squid manually
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 8 02:05:11 UTC 2012
  • 00:43 tfinc: killing long running show_bug.cgi procs on kaulen

January 7

  • 22:30 Reedy: Users reporting slowness while editing. dberror.log shows a few mysql errors for enwiki master and slaves. Few errors on other wikis, mainly enwiki
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Sat Jan 7 02:05:09 UTC 2012

January 6

  • 23:22 RobH: working rt1549 lvs1003 may flap, it is presently not in service due to possible hdd failure
  • 22:55 binasher: db22 is back in s4
  • 22:55 logmsgbot: asher synchronized wmf-config/db.php 'adding db22 back to s4'
  • 21:41 RobH: db1029 powering back up with ssd testing hardware installed
  • 21:35 RobH: db1029 coming down for ssd testing
  • 21:26 RobH: cp1014 and cp1019 hdd controller cables replaced (removed for testing controllers), both can be used normally
  • 21:19 binasher: restoring db22 from a live hotbackup of db1038
  • 21:18 RobH: es1002 back ready for service use per #2220: replace original RAID card in es1002
  • 21:05 binasher: putting db51 into production as an s4 slave
  • 21:05 logmsgbot: asher synchronized wmf-config/db.php 'adding db51 as an s4 slave'
  • 20:57 binasher: started slaving db51 off of db31
  • 20:21 RobH: rt2226 - redeploy db22 for asher
  • 20:19 RobH: db22 reinstalled and booting into OS. No puppet runs yet, now its Asher's problem ;]
  • 20:04 RobH: db22 reinstalling
  • 19:24 binasher: started innodb hot backup of db1038 to db51
  • 18:43 maplebed: s4 database rotation complete. outage duration 36 minutes.
  • 18:37 maplebed: pushed out new db.php setting s4 to read-write
  • 18:37 logmsgbot: ben synchronized wmf-config/db.php
  • 18:35 maplebed: db31 made read-write as the new master for s4
  • 18:31 maplebed: old master for s4 log file db22-bin.000106 log pos 631618956
  • 18:30 maplebed: new master for s4: db31, log file db31-bin.000213 log pos is 205612709
  • 18:24 logmsgbot: asher synchronized wmf-config/db.php 'setting s4 to read only, preparing to make db31 master'
  • 18:21 Reedy: Commons having db issues, db22 (s4 master) has a disk issue
  • 16:02 apergos: restarted lilghty on dataset2
  • 16:01 Reedy: HTTP server (lighttpd?) seems to be down on dataset2
  • 15:46 RoanKattouw: Removing gs_* files in /tmp on srv220 that are >30 min old
  • 15:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33556 - ArticleFeedback settings on Chinese wikipedia'
  • 15:43 RoanKattouw: Removed /tmp/mw-cache-1.17 and /tmp/mw-cache-1.17-test on srv220
  • 15:41 Reedy: srv220 / is at 100% usage
  • 15:41 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33556 - ArticleFeedback settings on Chinese wikipedia'
  • 14:34 mutante: saw the log about cp1043/44 being deliberately left broken, but requirement in varnish.pp also broke others, fixed on sq67,68,69 (gerrit change 1802)
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Fri Jan 6 02:05:01 UTC 2012
  • 01:25 binasher: puppet is being deliberately left broken on cp1043 and 1044 until tomorrow
  • 01:23 binasher: backend varnish instance on cp1042 running 3.0.2 is in production for 1/3 of mobile requests

January 5

  • 22:15 preilly: small fix for iPhone vary support
  • 22:15 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php
  • 21:39 Ryan_Lane: rebooting virt1
  • 21:01 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'wmgShortUrlPrefix'
  • 21:01 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'wmgShortUrlPrefix'
  • 20:08 Reedy: Created ShortUrl tables on testwiki
  • 20:07 logmsgbot: reedy synchronizing Wikimedia installation... : Update extensionmessages
  • 20:05 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'wmgUseShortUrl'
  • 20:04 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'wmgUseShortUrl'
  • 20:02 logmsgbot: reedy synchronized php-1.18/extensions/ShortUrl 'Pushing ShortUrl files out'
  • 19:08 notpeter: restarting dhcpd on brewster
  • 18:45 preilly: pushing fix for js error on production
  • 18:45 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ApplicationTemplate.php
  • 18:45 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/javascripts/application.js
  • 18:00 mutante: tarin - added "#includedir /etc/sudoers.d" to sudo config, needs to read /etc/sudoers.d/nrpe for Nagios RAID check
  • 17:49 logmsgbot_: hashar: gallium: cleaned /tmp . Our test suites leak a large amount of files :D
  • 17:49 ^demon: removed chuck norris plugin from jenkins, restarted
  • 16:48 mutante: payments4 - 25 running nginx procs cause a warning - but normal and just raise limit?
  • 16:15 mutante: people claim it was "completely resolved with "2.6.38-10 backport from PPA." (add-apt-repository ppa:kernel-ppa/ppa ...). wanna try that? (or just reboot ms1002 pls)
  • 15:49 mutante: quotes on kswapd problem (that also appeared on other servers): "has nothing to do with swap space or memory".."the kernel process which swaps tasks".."means the kernel is spending more time context switching tasks than it is actually executing the tasks".."you're chasing a ghost if you're trying to tune your swap/memory environment"
  • 15:45 mutante: ms1002 - kswapd 100% CPU - but no swap used and free memory left - this looks like https://bugs.launchpad.net/ubuntu/+bug/721896 again
  • 15:39 mutante: Nagios check_ntp does stuff like: overall average offset: 0 -> NTP OK: Offset unknown| -> NTP CRITICAL: Offset unknown (even though this bug was supposed to be fixed in a version before the one we use)..sigh
  • 15:34 mutante: dataset1 - date was off by ~ 27 hours. known issues RT 216 & 1345 with hardware clock, additionally though Nagios NTP check is still buggy (possibly due to leap seconds ;P) -> http://tech.akom.net/archives/27-Nagios-check_ntp-quits-working-in-2009-with-Offset-unknown.html)
  • 15:14 mutante: lvs1004 - puppet didnt run since 12 hours, looked stuck, "already in progress" on every run. rm /var/lib/puppet/state/puppetdlock, restart puppet agent, finished fine in a few seconds. maybe puppet bug 2888,5246 or related
  • 14:57 mutante: magnesium - memcached runs on default port 11211, but we run all the others on 11000, this causes Nagios CRIT. Is it supposed to run here? (was also on -l 127.0.0.1 only, but init script starts it on all)
  • 14:55 Jeff_Green: searchidx1 /a reached 100%, did the "space issues" maintenance procedure from wikitech search documentation
  • 14:39 mutante: same on srv193
  • 14:35 mutante: srv290 - before restart memcached was running with -m 64 and -l 127.0.0.1 for some reason, causing Nagios CRIT, now it looks like others and recovered
  • 14:32 mutante: restarting memcached on srv290
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Thu Jan 5 02:05:03 UTC 2012

January 4

  • 23:27 logmsgbot: catrope synchronizing Wikimedia installation... : Deploying MoodBar and MarkAsHelpful changes
  • 22:39 Tim: taking srv280 for action=purge slowness investigation
  • 21:20 Ryan_Lane: deploying LdapAuthentication 2.0a and OpenStackmanager 1.3 to virt1
  • 21:13 RoanKattouw: Applying schema changes to moodbar_feedback_response on all wikis (drop index, create index, add column)
  • 19:36 notpeter: restarting dhcpd on brewster
  • 19:13 RobH: dns update successful and none of them fell over
  • 19:12 Reedy: r108070 even
  • 19:12 logmsgbot: reedy synchronized php-1.18/extensions/CentralAuth/specials/ 'r107070'
  • 19:11 RobH: updating dns for mgmt of ms-fe1/2 and other new servers in tampa, as well as search boxen in eqiad
  • 19:04 mutante: srv199 boots but without eth0, NIC1 is Enabled in BIOS but MAC Address "Not Present" - creating hardware ticket
  • 18:55 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r108064'
  • 18:43 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Disable AFTv5 bucketing tracking again'
  • 18:38 mutante: powercycling srv199
  • 18:33 logmsgbot: catrope synchronized php-1.18/resources/startup.js 'touch'
  • 18:30 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Actually bump version number'
  • 18:28 logmsgbot: catrope synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Revert live hack'
  • 18:24 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'and bump the version number too'
  • 18:22 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Enable tracking for AFTv5 bucketing'
  • 18:06 mutante: duplicate nagios-wm instances on spence (/home/wikipedia/bin/ircecho vs. /usr/ircecho/bin/ircecho) killed them both, restarted with init.d/ircecho
  • 18:00 logmsgbot: catrope synchronized php-1.18/resources/mediawiki/mediawiki.user.js 'Live hack for tracking a percentage of bucketing events'
  • 17:52 mutante: knsq11 is broken. boots into installer, then "Dazed and confused" at hardware detection (NMI received for unknown reason 21 on CPU 0). -> RT 2206
  • 17:38 mutante: powercycling knsq11
  • 11:31 logmsgbot: catrope synchronized php-1.18/extensions/ClickTracking/ClickTracking.hooks.php 'r108017'
  • 08:44 logmsgbot: nikerabbit synchronized php-1.18/includes/specials/SpecialAllmessages.php 'r107998'
  • 07:40 Tim: fixed puppet by re-running the post-merge hook with key forwarding enabled, and then started puppet on ms6
  • 07:32 Tim: on ms6.esams: fixed proxy IP address and stopped puppet while I figure out how to fix it
  • 03:25 Tim: experimentally raised max_concurrent_checks to 128
  • 03:17 Tim: on spence in nagios.cfg, reduced service_reaper_frequency from 10 to 1, to avoid having a massive process count spike every 10 seconds as checks are started. Locally only as a test.
  • 02:27 Ryan_Lane: I should clarify that I removed 10.2.1.13 from /etc/network/interfaces, it's still properly bound to lo
  • 02:24 Tim: on spence: setting up logrotate for nagios.log and removing nagios-bloated-log.log
  • 02:22 Ryan_Lane: removing manually added 10.2.1.13 address from lvs4
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Wed Jan 4 02:04:57 UTC 2012
  • 01:43 Nemo_bis: Last week slowness: job queue backlog now cleared on !Wikimedia Commons and (almost) English !Wikipedia http://ur1.ca/77q9b
  • 01:02 logmsgbot: reedy synchronized php-1.18/includes/ 'r107978'
  • 00:45 logmsgbot: reedy synchronized php-1.18/extensions 'r107977, r107976'
  • 00:39 Tim: running purgeParserCache.php on hume, deleting objects older than 3 months
  • 00:38 logmsgbot: reedy synchronized php-1.18/includes/specials/ 'r107975'
  • 00:29 logmsgbot: tstarling synchronizing Wikimedia installation... :
  • 00:27 logmsgbot: reedy synchronized php-1.18/extensions/Nuke/ 'r107974'
  • 00:25 logmsgbot: reedy synchronized php-1.18/extensions/ 'r107970'

January 3

  • 23:00 Tim: on spence: restarting gmetad
  • 22:58 logmsgbot: reedy synchronizing Wikimedia installation... : Pushing r107953, r107955, r107956, r107957
  • 22:47 LeslieCarr: stopping and then starting apache2 on spence to try and lower load
  • 22:29 RobH: added in the lo addres to lvs4, now its working and generating thumbnails
  • 22:09 logmsgbot: reedy synchronizing Wikimedia installation... : Push r107938 r107948
  • 21:45 RobH: ganglia graphs will have missing data for past 30 to 40 minutes
  • 21:45 RobH: spence back online, ganglia and nagios confirmed operational
  • 21:38 RobH: resetting spence and dropping to serial to try to fix it
  • 21:25 RobH: nagios and ganglia down due to spence reboot, system still coming back online
  • 21:21 RobH: spence is unresponsive to ssh and serial console, rebooting
  • 21:14 LeslieCarr: resetting DRAC 5 on spence for management connectivity
  • 21:05 binasher: that fixed it. but how did that happen?
  • 21:05 binasher: ran ip addr add 10.2.1.22/32 label "lo:LVS" dev lo on lvs4
  • 19:36 logmsgbot: reedy synchronized php-1.18/skins/common/images/ 'r107930'
  • 17:36 mutante: killing more runJobs.php / nextJobDB.php processes on a bunch of servers (/home/catrope/badjobrunners)
  • 17:26 RoanKattouw: Stopping job runners on the following DECOMMISSIONED servers: srv151 srv152 srv153 srv158 srv160 srv164 srv165 srv166 srv167 srv168 srv170 srv176 srv177 srv178 srv181 srv184 srv185
  • 15:55 RobH: torrus back, took forever to recompile
  • 15:53 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33485 - Enable WikiLove in si.wikipedia'
  • 15:52 Reedy: Created wikilove tables on siwiki
  • 15:46 RobH: torrus deadlocked, kicking
  • 14:00 RoanKattouw: Restarting job runners on srv242 and mw25, those are the last ones that are stuck
  • 13:57 RoanKattouw: Restarting all job runners that are stuck
  • 13:48 RoanKattouw: Restarting job runner on srv236, seems to be stuck
  • 02:02 logmsgbot: LocalisationUpdate completed (1.18) at Tue Jan 3 02:05:21 UTC 2012

January 2

  • 23:36 Reedy: Seems to potentially be an issue with job runners, enwiki backed up to over 90k over the last week or so. Needs investigating
  • 23:18 logmsgbot: tstarling synchronized php-1.18/includes/parser/Parser.php 'r107856'
  • 22:58 logmsgbot: tstarling synchronizing Wikimedia installation... :
  • 18:08 logmsgbot: nikerabbit synchronized wmf-config/InitialiseSettings.php 'Bug 33368: WebFonts on bpywiki'
  • 18:05 logmsgbot: nikerabbit synchronized php-1.18/languages/messages/ 'i18ndeploy r107843'
  • 18:04 logmsgbot: nikerabbit synchronized php-1.18/extensions/WebFonts/WebFonts.i18n.php 'i18ndeploy r107843'
  • 16:58 mutante: installed SiteMap extension on Bugzilla - soon bugs should be googleable (per BZ:33406)
  • 16:33 mutante: upgraded Bugzilla from 4.0.2 to 4.0.3 (http://www.bugzilla.org/releases/4.0.3/release-notes.html#v40_point) (RT #2194)
  • 14:47 mutante: cleaned out gammu spool to stop sms bomb - sorry. deamon runs again now though..
  • 14:36 mutante: fixed gammu-smsd on spence per wikitech "Nagios#Fixing_the_USB_dongle" (sending out queued SMS now )
  • 14:30 mutante: puppet ran on spence, ganglia also seems ok despite the errors i logged before. gammu-smsd cant find device again though
  • 14:03 mutante: spence / gmetad - RRD_update .. illegal attempt to update using time .. last update time is .. (minimum one second step)
  • 13:57 mutante: gmond complains about missing kernel modules on spence when trying to start on boot
  • 13:54 mutante: spence down, no ssh, no mgmt output, powercycling it ..
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Mon Jan 2 02:04:47 UTC 2012
  • 00:08 logmsgbot: tstarling synchronized php-1.18/includes/media/SVGMetadataExtractor.php 'r107792'

January 1

  • 21:28 Ryan_Lane: restarted pdns-recursor on dobson
  • 21:26 Ryan_Lane: restarted pdns on ns2 about an hour ago
  • 09:46 apergos: restarted lucene search on srch 10, 11, then later on 3,4,9,1
  • 09:35 apergos: removed log.1 from /a/search/logs on search6, it was 35gb
  • 03:55 Tim: fixed broken package on search7 and search11
  • 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 1 02:04:30 UTC 2012
  • 01:36 Tim: adjusted FD limit in /etc/init.d/lsearchd on all search servers with sed
  • 01:34 Tim: increased FD limit on search6 and restarted lsearchd
  • 00:46 Tim: removed some logs on search6 to fix /a disk space exhaustion

Archives

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox