Memcached

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(new allocations)
 
(40 intermediate revisions by 15 users not shown)
Line 1: Line 1:
Memcached configuration.
+
The memcache setup has changed '''recently''' for the multi-honed datacenter deployment (SDTPA vs EQIAD).
  
/usr/local/bin/run-memcached.sh is run from /etc/rc.local at system startup on each machine, and starts any necessary instances of memcached.
+
When a memcache server is offline, it should be entirely removed from the configuration (if permanent) or commented out (if temporary.)  Unlike the old slot format config file, the new format doesn't require you to keep an identical number of members deployed in the memcached pool.  Please note that one shouldn't add and remove a lot of members from this list in a short period of time, as the consistent hashing may result in a server being unpooled, then repooled to cause the keys to remap directly back to the same server, resulting in undesired performance.  So if a server is down, fix it ASAP.  If it cannot be rebooted and fixed ASAP, then it should be removed.  If it is removed, it may be prudent to leave it unpooled for 24 hours? (Aaron advises we have some items that can cache in mc for 24 hours.)
  
==new allocations==
+
The memcached configuration php files are maintained in gerrit, under the operations/mediawiki-config branch.
New size 280 megabytes each. Something between 20 and 35 megabytes of swap each appears not to cause high page fault rates.
+
<tt>mediawiki-config/wmf-config/mc-eqiad.php</tt>
 +
<tt>mediawiki-config/wmf-config/mc-pmtpa.php</tt>
  
<table border=1>
+
Memcached installation is handed via the puppet memcached role; and are currently deployed to dedicated memcache systems (mcX and mcXXXX).
<tr><th>server<th>IP<th>copies<th>total<th>inc code<th>swp each
+
<tr><td>Bart
+
  <td>207.142.131.227
+
  <td>4
+
  <td>512
+
  <td>1360
+
  <td>4m
+
  <td>using [[Tugela]]
+
<tr><td>Bayle
+
  <td>207.142.131.228
+
  <td>4
+
  <td>1120
+
  <td>1360
+
  <td>4m
+
  <td>using memcached
+
<tr><td>Isidore
+
  <td>207.142.131.231
+
  <td>1
+
  <td>280
+
  <td>340
+
  <td>0
+
  <td>using memcached
+
<tr><td>Moreri
+
  <td>207.142.131.232
+
  <td>1
+
  <td>280
+
  <td>340
+
  <td>0
+
  <td>using memcached
+
<tr><td>Yongle
+
  <td>207.142.131.237
+
  <td>7
+
  <td>1960
+
  <td>2380
+
  <td>0
+
  <td>using memcached
+
<tr><th>total<th>&nbsp;<th>17<th>4760<th>&nbsp;<th>&nbsp;
+
  
</table>
 
  
$wgMemCachedServers = array(
+
You can test the servers to see which memcached are functioning and which are not with the following on fenari:
        "207.142.131.227:11000", # bart
+
<tt>cd /home/w/common/wmf-deployment/maintenance; mwscript mctest.php</tt>
        "207.142.131.227:11001",
+
        "207.142.131.227:11002",
+
        "207.142.131.227:11003",
+
        "207.142.131.228:11000", # bayle
+
        "207.142.131.228:11001",
+
        "207.142.131.228:11002",
+
        "207.142.131.228:11003",
+
        "207.142.131.231:11000", # isidore
+
        "207.142.131.232:11000", # moreri
+
        "207.142.131.237:11000", # yongle
+
        "207.142.131.237:11001",
+
        "207.142.131.237:11002",
+
        "207.142.131.237:11003",
+
        "207.142.131.237:11004",
+
        "207.142.131.237:11005",
+
        "207.142.131.237:11006"
+
);
+
  
==temporary changes==
+
Note that all servers that are up will return <tt>incr: 100  get: 100</tt> and servers that are down will return <tt>incr: 0  get: 0</tt>.
Until the memcached size can be changed from 180MB.
+
  
<table border=1>
+
== Editing mc.php ==
<tr><th>server<th>IP<th>copies<th>total<th>inc code<th>swp each
+
<tr><td>Bart
+
  <td>207.142.131.227
+
  <td>6
+
  <td>1080
+
  <td>1440
+
  <td>15m
+
<tr><td>Bayle
+
  <td>207.142.131.228
+
  <td>6
+
  <td>1080
+
  <td>1440
+
  <td>21m
+
<tr><td>Isidore
+
  <td>207.142.131.231
+
  <td>2
+
  <td>360
+
  <td>480
+
  <td>22m
+
<tr><td>Moreri
+
  <td>207.142.131.232
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>0
+
<tr><td>Yongle
+
  <td>207.142.131.237
+
  <td>11
+
  <td>1980
+
  <td>2640
+
  <td>27m
+
<tr><th>total<th>&nbsp;<th>26<th>4680<th>&nbsp;<th>&nbsp;
+
</table>
+
  
CommonSettings.php entry looks like this:
+
The memcached configuration php files are maintained in gerrit, under the operations/mediawiki-config branch.
<pre>
+
<tt>mediawiki-config/wmf-config/mc-eqiad.php</tt>
$wgMemCachedServers = array(
+
<tt>mediawiki-config/wmf-config/mc-pmtpa.php</tt>
        "207.142.131.227:11000", # bart
+
        "207.142.131.227:11001",
+
        "207.142.131.227:11002",
+
        "207.142.131.227:11003",
+
        "207.142.131.227:11004",
+
        "207.142.131.227:11005",
+
        "207.142.131.228:11000", # bayle
+
        "207.142.131.228:11001",
+
        "207.142.131.228:11002",
+
        "207.142.131.228:11003",
+
        "207.142.131.228:11004",
+
        "207.142.131.228:11005",
+
        "207.142.131.231:11000", # isidore
+
        "207.142.131.231:11001",
+
        "207.142.131.232:11000", # moreri
+
        "207.142.131.237:11000", # yongle
+
        "207.142.131.237:11001",
+
        "207.142.131.237:11002",
+
        "207.142.131.237:11003",
+
        "207.142.131.237:11004",
+
        "207.142.131.237:11005",
+
        "207.142.131.237:11006",
+
        "207.142.131.237:11007",
+
        "207.142.131.237:11008",
+
        "207.142.131.237:11009",
+
        "207.142.131.237:11010"
+
);
+
</pre>
+
  
==swap analysis==
+
If you want to add a server from the spare list to the active list, please test it first. You can run on fenari
Done just after peak time after 8 days running (for most systems). Mmecached itself seems to use about 60 megabytes per copy.
+
  
<table border=1>
+
   <tt>cd /home/w/common/wmf-deployment/maintenance; mwscript mctest.php enwiki ip-address-here:11000</tt>
<tr><th>server<th>IP<th>copies<th>total<th>inc code<th>pg flt<th>CPU min<th>cached<th>MC swapped
+
<tr><td>dalembert
+
   <td>207.142.131.194
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>43000
+
  <td>12
+
  <td>214756
+
  <td>48m
+
<tr><td>Tingxi
+
  <td>207.142.131.195
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>143000
+
  <td>25
+
  <td>252928k
+
  <td>112m
+
<tr><td>Alrazi
+
  <td>207.142.131.196
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>192000
+
  <td>39
+
  <td>13232k
+
  <td>127m
+
<tr><td>Friedrich
+
  <td>207.142.131.197
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>170000
+
  <td>61
+
  <td>234780
+
  <td>147m
+
<tr><td>Harris
+
  <td>207.142.131.199
+
  <td>1
+
  <td>180
+
  <td>240
+
  <td>142000
+
  <td>31
+
  <td>304076
+
  <td>136m
+
<tr><td>Bart
+
  <td>207.142.131.227
+
  <td>6
+
  <td>1080
+
  <td>1440
+
  <td>1820
+
  <td>516
+
  <td>441232
+
  <td>87m/15e
+
<tr><td>Bayle
+
  <td>207.142.131.228
+
  <td>6
+
  <td>1080
+
  <td>1440
+
  <td>3734
+
  <td>425
+
  <td>420096
+
  <td>127m/21e
+
<tr><td>Isidore
+
  <td>207.142.131.231
+
  <td>2
+
  <td>360
+
  <td>480
+
  <td>2486
+
  <td>224
+
  <td>341100
+
  <td>44m/22e
+
<tr><td>Moreri
+
  <td>207.142.131.232
+
  <td>2
+
  <td>360
+
  <td>480
+
  <td>52000
+
  <td>693
+
  <td>352312
+
  <td>71m/35e
+
<tr><td>Yongle
+
  <td>207.142.131.237
+
  <td>12
+
  <td>2160
+
  <td>2880
+
  <td>16574
+
  <td>208
+
  <td>13640k
+
  <td>476m/40e
+
<tr><td>Browne 2GB
+
  <td>207.142.131.229
+
  <td>0
+
  <td>0
+
  <td>&nbsp;
+
  <td>broken
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>&nbsp;
+
<tr><td>Coronelli 3GB
+
  <td>207.142.131.230
+
  <td>0
+
  <td>0
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>237668 buf 622124 cache
+
  <td>squid 514M of 2341M
+
<tr><td>Maurus 4GB
+
  <td>207.142.131.238
+
  <td>0
+
  <td>0
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>264640 buf
+
  <td>squid 514M of 1936M
+
<tr><td>Rabanus 4GB
+
  <td>207.142.131.239
+
  <td>0
+
  <td>0
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>&nbsp;
+
  <td>267476k buf 727132 cache
+
  <td>squid 514MB of 2557M
+
<tr><th>total<th>&nbsp;<th>34<th>6120<th>&nbsp;<th>&nbsp;<th>&nbsp;<th>&nbsp;
+
  
</table>
+
(it runs on a nonstandard port). As above, servers that are up will return incr: 100 get: 100 and servers that are down will return incr: 0 get: 0.
  
 +
Once you finish updating the file, you must '''git commit''' and then '''git review'''.  You should be logged in as your own username, as your user keys will be checked.  Then you need to git fetch and merge on fenari in the /home/wikimedia/common/wmf-deployment directory and sync out changes.
  
 
+
[[Category:How-To]]
==new==
+
180MB each. 200MB looked too tight for two copies each on Isidore and Moreri. Vincent wasn't used because it showed cyclic high RAM use. Switched on [[19 September]] [[2004]].
+
<table border=1>
+
<tr><th>server<th>IP<th>copies<th>total<th>notes
+
<tr><td>dalembert
+
  <td>207.142.131.194
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><td>Tingxi
+
  <td>207.142.131.195
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><td>Alrazi
+
  <td>207.142.131.196
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><td>Friedrich
+
  <td>207.142.131.197
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><td>Harris
+
  <td>207.142.131.199
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><td>Bart
+
  <td>207.142.131.227
+
  <td>6
+
  <td>1080
+
  <td>&nbsp;
+
<tr><td>Bayle
+
  <td>207.142.131.228
+
  <td>6
+
  <td>1080
+
  <td>&nbsp;
+
<tr><td>Isidore
+
  <td>207.142.131.231
+
  <td>2
+
  <td>360
+
  <td>&nbsp;
+
<tr><td>Moreri
+
  <td>207.142.131.232
+
  <td>2
+
  <td>360
+
  <td>&nbsp;
+
<tr><td>Yongle
+
  <td>207.142.131.237
+
  <td>12
+
  <td>2160
+
  <td>&nbsp;
+
<tr><td>Avicenna
+
  <td>207.142.131.249
+
  <td>1
+
  <td>180
+
  <td>&nbsp;
+
<tr><th>total<th>&nbsp;<th>34<th>6120<th>&nbsp;
+
 
+
</table>
+
 
+
CommonSettings.php entry looks like this:
+
<pre>
+
$wgMemCachedServers = array(
+
        "207.142.131.194:11000", # dalembert
+
        "207.142.131.195:11000", # tingxi
+
        "207.142.131.196:11000", # alrazi
+
        "207.142.131.197:11000", # friedrich
+
        "207.142.131.199:11000", # harris
+
        "207.142.131.227:11000", # bart
+
        "207.142.131.227:11001",
+
        "207.142.131.227:11002",
+
        "207.142.131.227:11003",
+
        "207.142.131.227:11004",
+
        "207.142.131.227:11005",
+
        "207.142.131.228:11000", # bayle
+
        "207.142.131.228:11001",
+
        "207.142.131.228:11002",
+
        "207.142.131.228:11003",
+
        "207.142.131.228:11004",
+
        "207.142.131.228:11005",
+
        "207.142.131.231:11000", # isidore
+
        "207.142.131.231:11001",
+
        "207.142.131.232:11000", # moreri
+
        "207.142.131.232:11001",
+
        "207.142.131.237:11000", # yongle
+
        "207.142.131.237:11001",
+
        "207.142.131.237:11002",
+
        "207.142.131.237:11003",
+
        "207.142.131.237:11004",
+
        "207.142.131.237:11005",
+
        "207.142.131.237:11006",
+
        "207.142.131.237:11007",
+
        "207.142.131.237:11008",
+
        "207.142.131.237:11009",
+
        "207.142.131.237:11010",
+
        "207.142.131.237:11011",
+
        "207.142.131.249:11000", # avicenna
+
);
+
</pre>
+
 
+
==old==
+
512MB each, used prior to [[19 September]] [[2004]].
+
<table border=1>
+
<tr><th>server<th>IP<th>copies<th>total<th>notes
+
<tr><td>Rabanus
+
  <td>207.142.131.239
+
  <td>4
+
  <td>2048
+
  <td>moving to squid
+
<tr><td>Yongle
+
  <td>207.142.131.237
+
  <td>4
+
  <td>2048
+
  <td>keeping - runs stats job
+
<tr><td>Bart
+
  <td>207.142.131.227
+
  <td>2
+
  <td>1024
+
  <td>&nbsp;
+
<tr><td>Bayle
+
  <td>207.142.131.228
+
  <td>2
+
  <td>1024
+
  <td>&nbsp;
+
<tr><th>total<th>&nbsp;<th>12<th>6144<th>&nbsp;
+
</table>
+
 
+
CommonSettings.php entry looked like this:
+
<pre>
+
$wgMemCachedServers = array(
+
        "207.142.131.227:11000", # bart
+
        "207.142.131.227:11001",
+
        "207.142.131.228:11000", # bayle
+
        "207.142.131.228:11001",
+
        "207.142.131.237:11000", # yongle
+
        "207.142.131.237:11001",
+
        "207.142.131.237:11002",
+
        "207.142.131.237:11003",
+
        "207.142.131.239:11000", # rabanus
+
        "207.142.131.239:11001",
+
        "207.142.131.239:11002",
+
        "207.142.131.239:11003"
+
);
+
</pre>
+

Latest revision as of 00:33, 12 February 2013

The memcache setup has changed recently for the multi-honed datacenter deployment (SDTPA vs EQIAD).

When a memcache server is offline, it should be entirely removed from the configuration (if permanent) or commented out (if temporary.) Unlike the old slot format config file, the new format doesn't require you to keep an identical number of members deployed in the memcached pool. Please note that one shouldn't add and remove a lot of members from this list in a short period of time, as the consistent hashing may result in a server being unpooled, then repooled to cause the keys to remap directly back to the same server, resulting in undesired performance. So if a server is down, fix it ASAP. If it cannot be rebooted and fixed ASAP, then it should be removed. If it is removed, it may be prudent to leave it unpooled for 24 hours? (Aaron advises we have some items that can cache in mc for 24 hours.)

The memcached configuration php files are maintained in gerrit, under the operations/mediawiki-config branch.

mediawiki-config/wmf-config/mc-eqiad.php
mediawiki-config/wmf-config/mc-pmtpa.php

Memcached installation is handed via the puppet memcached role; and are currently deployed to dedicated memcache systems (mcX and mcXXXX).


You can test the servers to see which memcached are functioning and which are not with the following on fenari:

cd /home/w/common/wmf-deployment/maintenance; mwscript mctest.php

Note that all servers that are up will return incr: 100 get: 100 and servers that are down will return incr: 0 get: 0.

[edit] Editing mc.php

The memcached configuration php files are maintained in gerrit, under the operations/mediawiki-config branch.

mediawiki-config/wmf-config/mc-eqiad.php
mediawiki-config/wmf-config/mc-pmtpa.php

If you want to add a server from the spare list to the active list, please test it first. You can run on fenari

 cd /home/w/common/wmf-deployment/maintenance; mwscript mctest.php enwiki ip-address-here:11000

(it runs on a nonstandard port). As above, servers that are up will return incr: 100 get: 100 and servers that are down will return incr: 0 get: 0.

Once you finish updating the file, you must git commit and then git review. You should be logged in as your own username, as your user keys will be checked. Then you need to git fetch and merge on fenari in the /home/wikimedia/common/wmf-deployment directory and sync out changes.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox