Squids

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Setup)
(remove outdated references (e.g. yaseo); switch from RCS to git)
 
(60 intermediate revisions by 16 users not shown)
Line 1: Line 1:
__TOC__
+
There are 4 clusters of squid servers, one upload and one text at each of our two locations: esams and pmpta. Each server runs two instances of squid: a frontend squid listening on port 80, and a cache squid listening on port 3128. The purpose of the frontend squid is to distribute load to the cache squids based on URL hash, using the [http://icp.ircache.net/carp.txt CARP] algorithm.
  
==Emergency operations==
+
[[LVS]] is used to balance incoming requests between the frontends that use CARP to distribute the traffic to the backends.
===To switch away from Paris squids===
+
*Need to be root
+
*On zwinger and larousse:
+
** <tt>vim /usr/local/etc/pdns.geomap</tt> and comment out all lines beginning with a number other than 0 (0 is default route, to go to Florida, must be left in).
+
** <tt>pdns_control rediscover</tt> to tell PowerDNS to reload configuration.
+
** Wait for DNS propagation time (600s, in /usr/local/etc/pdns.conf geo-ttl=600 . Might reduce it to give faster switch back after problem is over, reducing it can't make the emregency fix work faster though - it's cache time for other DNS servers)
+
*Reverse these changes to switch back.
+
  
===Big trouble with Florida squids===
+
== RE-Installation ==
* check to see if <tt>access.log</tt> (probably in <tt>/var/log/squid</tt>) has reached 2GB in size.  If it has, logrotate it or at least rename it.
+
  
==General==
+
Please note that NEW squid servers need to be setup by someone who understands the full setup.  There are a number of various setttings that have to be configured. Thus the instructions are only for '''reinstallation'''.
  
* Current squid machines: <s>browne</s>, benet, maurus, rabanus, will
+
To reinstall a ''previously existing'' squid server:
  
* to add new IPs for squid:
+
* Reinstall the server OS.
** add the IPs in /var/named/master/wikipedia.zone in the obvious places (on zwinger)
+
* After boot, copy the old SSH hostkey back using <tt>scp -o StrictHostKeyChecking=no ''files'' ''hostname'':/etc/ssh/</tt>
** make sure to update the serial number in the zone file!
+
:* These should all be saved on tridge in the /data/hostkeys/
** run <tt>rndc reload</tt> to make it update - no need to restart named
+
* Follow the instructions on [[Puppet#Reinstalls]]
 +
:* Will have to run puppet a couple of times, get dependency error due to not having deployed below step yet:
 +
* Deploy the Squid configuration files on fenari:
 +
# cd /home/w/conf/squid
 +
# ./deploy ''servername''
 +
* If the system has been offline for over 2 hours, its cache will need to be cleaned with:
 +
<pre>
 +
/etc/init.d/squid clean
 +
</pre>
 +
* Manually run puppet update and ensure system is still online.
 +
* Check the LVS server to ensure the system is fully online.
 +
* Check the CacheManager interface for open connections, ensure they normalize on reinstalled squid BEFORE taking any more offline.
  
* some useful commands:
 
** dig ANY en.wikipedia.org @zwinger.wikipedia.org
 
** dig +short ANY en.wikipedia.org
 
  
* editing squid.conf for all: /h/w/conf/squid, read the README
 
  
=== Squid builds ===
+
=== Deploying more Squids ===
  
Currently running:
+
<pre>
* stable9 + all stable 9 patches except broken 2GB patch + gwicke + nortt + [[Multicast HTCP purging|htcpclr]] (identifies as: 2.5.STABLE9.wp20050410.S9plus.no2GB[icpfix,nortt,htcpclr])
+
17:53:02  * mark extends the squid configuration (text-settings.php) with config for the new squids
* builddir: <tt>/home/wikipedia/src/squid/squid-2.5.STABLE9-kate-no2GB</tt>
+
17:57:38  * mark deploys the squid configs to the new squid hosts only, so puppet can do its task. old config remains on the rest of the squids,
 +
so they're still unaffected
 +
17:57:52 <mark> (I promised Rob to show every step of squid deployment in case anyone's wondering ;)
 +
17:58:23  * mark checks whether MediaWiki is setup to recognize the new squids as proxies (CommonSettings.php)
 +
17:58:55 <mark> yes it is
 +
18:01:42  * mark checks whether puppet has initialized the squids; i.e. both squid instances are running, and the correct LVS ip is bound
 +
18:03:11 <mark> where puppet hasn't run yet since the squid config deploy, I trigger it with "puppetd --test"
 +
18:04:10 <mark> they've all nicely joined ganglia as well
 +
18:08:56 <mark> alright, both squid instances are running on the new text squids
 +
18:09:06 <mark> time to setup statistics so we can see what's happening and we're not missing any requests in our graphs
 +
18:09:15 <mark> both torrus and cricket
 +
18:11:29 <mark> cricket done...
 +
18:14:58 <mark> torrus done as well
 +
18:15:03  * mark watches the graphs to see if they're working
 +
18:15:22 <mark> if not, probably something went wrong earlier with puppet setup or anything
 +
18:17:45 <mark> in the mean time, backend squids are still starting up and reading their COSS partitions (which are empty), which takes a while
 +
18:17:48 <mark> nicely visible in ganglia
 +
18:21:32 <mark> alright, all squids have finished reading their COSS partition, and torrus is showing reasonable values in graphs
 +
18:21:43 <mark> so all squids are correctly configured and ready for service
 +
18:21:50 <mark> but they have EMPTY CACHES
 +
18:22:11 <mark> giving them the full load now, would mean that they would start off with forwarding every request they get onto the backend
 +
                apaches
 +
18:22:51 <mark> I am going to seed the caches of the backend squids first
 +
18:22:55 <mark> we have a couple of ways of doing that
 +
18:23:19 <mark> first, I'll deploy the *new* squid config (which has all the new backend squids in it) to *one* of the frontend squids on the
 +
                previously existing servers
 +
18:23:33 <mark> that way that frontend squid will start using the new servers, and filling their caches with the most common requests
 +
18:23:44 <mark> let's use the frontend squid on sq66
 +
18:24:38  * mark runs "./deploy sq66"
 +
18:24:52 <mark> so only sq66 is sending traffic to sq71-78 backend squids now
 +
18:25:02 <mark> which is why they're all using approximately 1% cpu
 +
18:25:31 <mark> now we wait a while and watch the hit rate rise on the new backend squids
 +
18:25:51 <mark> e.g. http://torrus.wikimedia.org/torrus/CDN?path=%2FSquids%2Fsq77.wikimedia.org%2Fbackend%2FPerformance%2FHit_ratios
 +
18:29:26 <mark> no problems visible in the squid logs either
 +
18:32:23 <mark> each of the new squids is serving about 1 Mbps of backend traffic
 +
18:37:10 <mark> the majority of all requests are being forwarded to the backend... let's wait until the hit ratio is a bit higher
 +
18:38:04 <mark> I'll deploy the config to a few more frontend squids so it goes a bit faster
 +
18:54:02 <mark> sq77 is weird in torrus
 +
18:54:10 <mark> it reports 100% request hit ratio and byte hit ratio
 +
18:54:29 <mark> and is still empty in terms of swap..
 +
18:54:33  * mark investigates
 +
18:54:51 <mark> it's not getting traffic
 +
18:58:27 <mark> I think that's just the awful CARP hashing algorithm :(
 +
18:58:31 <mark> it has an extremely low CARP weight
 +
19:04:21  * mark deploys the new squid conf to a few more frontend squids
 +
19:05:29 <mark> they're getting some serious traffic now
 +
19:29:03 <mark> ok
 +
19:29:07 <mark> hit rate is up to roughly 45% now
 +
19:29:30 <mark> swap around 800M per server, and around 60k objects
 +
19:29:40 <mark> I'm confident enough to pool all the new backend squids
 +
19:29:46 <mark> but with a lower CARP weight (10 instead of 30)
 +
19:32:31 <mark> in a few days, when all the newe servers have filled their caches, we can decommision sq40 and lower
 +
19:33:02  * mark watches backend requests graphs and hit ratios
 +
19:40:05 <mark> looks like the site is not bothered at all by the extra load - the seeding worked well
 +
19:40:08 <mark> nice time to get some dinner
 +
19:40:19 <mark> afterwards I'll increase the CARP weight, and pool the frontends
 +
19:40:38 <mark> ...and then repeat that for upload squids
 +
20:19:18 <mark> ok.. the hit ratio is not high enough to my liking, but I can go on with the frontends
 +
20:19:26 <mark> the frontend squids are pretty much independent from the backends
 +
20:19:35 <mark> we need to seed their caches as well, but it's quick
 +
20:20:05 <mark> they don't like to get an instant 2000 requests/s from nothing when we pool them in LVS, so pretty much the only way we can
 +
                mitigate that is to pool them with low load (1)
 +
20:29:47 <mark> ok, frontend text squids now fully deployed
 +
</pre>
  
==IPs of virtual ethernet interfaces==
+
== Configuration ==
* to find out the IPs, run <tt>host en.wikipedia.org</tt> and <tt>host cache.wikimedia.org</tt>
+
* Used to be assigned at boot time, but this can lead to problems with duplicated IP addresses.
+
* According to damyta, should use 255.255.255.255 for netmask for the IPs of virtual eth interfaces - otherwise there can be routing confusion.
+
** We've been using this configuration for months now and it works fine. -- [[User:Jeronim|Jeronim]] 10:41, 19 Sep 2004 (UTC)
+
* <strike>You must ping through the switch via a virtual interface which you have just brought up (using ping -I), to update the switch's idea of who has what IP.  When pinging with -I207.142.131.248 or similar, use ''suda'' as the ping target; pinging the broadcast address ''does not always work''.</strike>
+
** Can use script ''takeip'' in /home/wikipedia/bin to take over an IP if a squid goes down.
+
** To take down a virtual eth interface, <tt>/sbin/ifconfig eth0:n down</tt>
+
** TODO: set up ''heartbeat'' or wackamole for automatic IP takeover.
+
  
== Starting ==
+
Configuration is done by editing the master files in <tt>/home/wikipedia/conf/squid</tt>, then running <tt>make</tt> to rebuild the configuration files, and <tt>./deploy</tt> to deploy them to the remote servers. The configuration files are:
* (/usr/local/bin/) squid is a symlink to /usr/local/squid/RunCache which is a loop that starts squid with the right args and sets the ulimit (squid needs more than 1024 fd's, hence it does ulimit -n 8192).
+
  
== Reloading ==
+
; squid.conf.php : Template file for the cache (backend) instances
After config changes, call squidhup (in /usr/local/bin) which is short for /usr/local/squid/sbin/squid -k reconfigure.
+
; frontend.conf.php : Template file for the frontend instances
 +
; text-settings.php : A settings array which applies to text squids. All elements in this array will become available as variables during execution of squid.conf.php and frontend.conf.php. The <tt>settings</tt> array can be used to give server-specific configuration.
 +
; upload-settings.php : Same as text-settings.php but for upload squids
 +
; common-acls.conf : ACL directives used by both text and upload frontends. Use this to block clients from all access.
 +
; upload-acls.conf : ACL directives used by upload frontends. Use this for e.g. image referrer regex blocks.
 +
; text-acls.conf : ACL directives used by text frontends. Use this for e.g. remote loader IP blocks.
 +
; Configuration.php : Contains most of the generator code
 +
; generate.php : the script that the makefile runs
  
==Logs==
+
The configs are under version control using git.
Currently on [[yongle]] in /var/backup/archiv, and/or on [[zwinger]] in /home/wikipedia/logs/archiv/
+
  
==French squids==
+
The deployment script has lots of options. Run it with no arguments to get a summary.
:''See [[caching servers out of Florida]] for rationale and performance.''
+
===Setup===
+
* Hardware
+
** 3 600MHz Celeron 1U machines with 20GB HDD &ndash; HP Web Hosting Server Appliance sa1100 - specs: [http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentGroup=NT_CT_Specifications&locale=en_US&docIndexId=178991&taskId=101&prodTypeId=15351&prodSeriesId=62929]
+
** 2 serial ports, 2 NICs, no graphics card
+
** Debian
+
** use PC133 ECC unregistered DIMMs, 256 MiB max per DIMM, 4 slots per machine (possibility of using registered RAM? probably only if all DIMMs are registered)
+
** ennael: 768 MiB RAM (original 128 MiB DIMM faulty and removed on January 12, 2005; 256 MiB DIMM installed on Sunday 16, 2005); chloe, bleuenn: 640 MiB RAM (128 in original machine + 2 × 256 upgrade)
+
**On the way, waiting to be installed: sophie and florence (same specs, 128 MiB RAM, waiting for 768 MiB extension each).
+
  
* Network connectivity
+
=== Changing configuration ===
** Dexlan 5-port 100base-TX Ethernet switch (port 1=uplink, ports 2-4=squids, port 5=free); all cables straight;
+
# cd /home/w/conf/squid
** names are ''chloe'' (212.85.150.132), ''bleuenn'' (212.85.150.133), and ''ennael'' (212.85.150.131);
+
Edit *-settings.php or *-acls.php
** addresses 212.85.150.130 and 212.85.150.134 may also be used to connect other machines (laptop for maintenance...);
+
# make
** IP block : 212.85.150.128/29
+
*** they now grant us /28 but we haven't yet redone the setup
+
** Network : 212.85.150.128
+
** Netmask : 255.255.255.248
+
** Broadcast : 212.85.150.135
+
** Gateway : 212.85.150.129
+
* null-modem serial cables run from ttyS1 on one machine to ttyS0 on another; setting is 19200 bps;
+
** getty running on ttyS0 on each machine, to allow communication using minicom;
+
** order (caller to listener) is chloe → ennael → bleuenn → chloe;
+
** BIOS, grub and kernel all configured for serial console; BIOS and grub accessible through serial console;
+
** as a consequence, any remote rebooting should be made by logging into the preceding machine in the ring and running minicom onto the machine to be rebooted;
+
* rack space provided by [http://www.lost-oasis.fr/ Lost Oasis] inside the [http://www.telecity.fr/france.htm Telecity colocation] in [[:en:Aubervilliers|Aubervilliers]] near the northern city limits of [[:en:Paris|Paris]].
+
* Contact: [[:en:User:Med|Med]] or [[:en:User:David.Monniaux|Submarine]].
+
  
===Network specificities===
+
To see the changes in the generated configuration vs what should be already deployed, run:
====Obsolete information====
+
$ diff -ru deployed/ generated/
The network provider that gives us rack space and bandwidth for free pays a lot for transit to certain destinations. Because of this, they throttle those destinations with a maximum traffic going out of our host, at the level of the last router.
+
  
The symptoms for this are long ping times from the affected networks.
+
If these changes are ok, you can deploy them to all servers, either all at once or a subset of servers, either fast or slowly. See <tt>./deploy -h</tt> for all possible options.
  
It especially looks like anything going through [http://www.opentransit.net OpenTransit.net] is throttled to approximatively 5 Mbits/s.
+
# ./deploy all
  
Normally, this throttling does not concern any network in France. However, on Sunday, January 16 evening, networking problems caused ALL traffic, including French traffic, to go through OpenTransit and thus to be throttled.
+
Using this invocation, the script will copy the newly generated config files into the <tt>deployed/</tt> directory, rsync it to the puppetmaster, and then scp them to each server and reload the squid process(es).
  
As of January 20, all traffic restrictions were lifted. The provider will warn us if we use too much transit.
+
# git commit -m "A meaningful commit message"
  
====Status====
+
You should always commit your changes to git to allow for history tracking and rollback.
See [http://www.ielo.net/ Ielo]'s page for network status. ->[http://www.ielo.net/graph_image.php?local_graph_id=268&rra_id=1 traffic graph]
+
  
From broadband connections in France, at off-peak times, the download speed for large files is around 80 kiB/s when not in cache and 235 kiB/s when in cache.
+
== Current problems ==
 +
* upload squids in tampa are unhappy because ms5 (thumbs) is slow in responding
  
===Countries and content concerned===
+
== Monitoring ==
fr:, en:, commons and upload (all wikis) are cached for Belgium, France, Luxembourg, Switzerland and the United Kingdom.
+
You can get some nice stats about the squids by going to http://noc.wikimedia.org/cgi-bin/cachemgr.cgi (user name root, password is in the squid configuration file).  The squids are each listed twice in the drop-down, once for front end and once for back-end.  Peer Cache stats for the backend is especially handy.
  
See the [http://bleuenn.wikimedia.org:8080/country-stats statistics page].
+
== Debugging ==
 
+
To see HTTP requests sent from Squids to their backend, install ngrep and run e.g.:
==cachemgr.cgi==
+
# ngrep -W byline port 80 and dst host ms4.wikimedia.org
There is a cachemgr.cgi available at http://noc.wikimedia.org/~mark/cgi-bin/cachemgr.cgi. The password (for at least the French squids at this moment) can be found in <tt>/home/wikipedia/doc/fr-cachemgr-pw</tt>.
+
== HowTo ==
 
+
=== Edit ACLs ===
==to investigate==
+
in <code>/home/w/conf/squid</code> edit <code>text-acls.conf</code>, then run <code>make</code>, then run <code>./deploy all</code>
* Selectively remove Squid-2 cache contents: http://www.wa.apana.org.au/~dean/squidpurge/
+
=== Purge a given external (non WMF) url ===
* change fd limit to 8192 in /usr/local/squid/bin/RunCache, restart squids. Fl squids are sometimes close to their current limit of 4096.
+
in <code>/home/wikipedia/common/php/maintenance</code> do: <code>echo '<nowiki>http://blahblahblah</nowiki>' | php ./purgeList.php --wiki aawiki</code>
 
+
==Diagnostic aid==
+
innocence modified the squid error pages to identify which squid was erroring; it was mostly benet on 14 Jan 04; Steps Were Taken
+
 
+
==New Donors==
+
 
+
As, and even before, we get our protocols entirely worked out for remote squid sites, people are going to [[Volunteer Squid Sites|volunteer]]. That link will provide such sites with a place to put their contact and other information, and for us to ask them questions.
+
  
 
== See also ==
 
== See also ==
 
* [[MediaWiki caching]] -- some cache headers explained
 
* [[MediaWiki caching]] -- some cache headers explained
 
* [[Multicast HTCP purging]] -- new method of cache purging
 
* [[Multicast HTCP purging]] -- new method of cache purging
 +
* [[Squid logging]]
 +
* [[Squid log format]]
  
== New squid setup ==
+
[[Category:Current]]
=== memcheck ===
+
[[Category:Squid]]
Squid has demanding memory access patterns, these memtest options might help to catch more problems before they are real ones:
+
[[Category:How-To]]
"Also, go to the options, and turn on caching, as well as all memory addresses and tests ... (keys pressed if I can remember, is:
+
c->1->2->2->3->3->3
+
should turn on above options for memtest)." (from [http://www.uwsg.iu.edu/hypermail/linux/kernel/0401.0/1087.html])
+
 
+
=== Setup ===
+
 
+
* Add the machine to /home/wikipedia/conf/squid/Makefile, run make
+
* On the new machine run '''/home/wikipedia/bin/squidsetup''' as root after doing a ssh-add. If there are errors fetching the squid stuff, adjust the server to fetch from in the script.
+
* follow the instructions re crontab
+
* test the server using telnet or nc
+
* take ips
+
* add /usr/local/bin/squid to /etc/rc.local
+

Latest revision as of 10:43, 21 August 2012

There are 4 clusters of squid servers, one upload and one text at each of our two locations: esams and pmpta. Each server runs two instances of squid: a frontend squid listening on port 80, and a cache squid listening on port 3128. The purpose of the frontend squid is to distribute load to the cache squids based on URL hash, using the CARP algorithm.

LVS is used to balance incoming requests between the frontends that use CARP to distribute the traffic to the backends.

Contents

[edit] RE-Installation

Please note that NEW squid servers need to be setup by someone who understands the full setup. There are a number of various setttings that have to be configured. Thus the instructions are only for reinstallation.

To reinstall a previously existing squid server:

  • Reinstall the server OS.
  • After boot, copy the old SSH hostkey back using scp -o StrictHostKeyChecking=no files hostname:/etc/ssh/
  • These should all be saved on tridge in the /data/hostkeys/
  • Will have to run puppet a couple of times, get dependency error due to not having deployed below step yet:
  • Deploy the Squid configuration files on fenari:
# cd /home/w/conf/squid
# ./deploy servername
  • If the system has been offline for over 2 hours, its cache will need to be cleaned with:
/etc/init.d/squid clean
  • Manually run puppet update and ensure system is still online.
  • Check the LVS server to ensure the system is fully online.
  • Check the CacheManager interface for open connections, ensure they normalize on reinstalled squid BEFORE taking any more offline.


[edit] Deploying more Squids

17:53:02  * mark extends the squid configuration (text-settings.php) with config for the new squids
17:57:38  * mark deploys the squid configs to the new squid hosts only, so puppet can do its task. old config remains on the rest of the squids, 
so they're still unaffected
17:57:52 <mark> (I promised Rob to show every step of squid deployment in case anyone's wondering ;)
17:58:23  * mark checks whether MediaWiki is setup to recognize the new squids as proxies (CommonSettings.php)
17:58:55 <mark> yes it is
18:01:42  * mark checks whether puppet has initialized the squids; i.e. both squid instances are running, and the correct LVS ip is bound
18:03:11 <mark> where puppet hasn't run yet since the squid config deploy, I trigger it with "puppetd --test"
18:04:10 <mark> they've all nicely joined ganglia as well
18:08:56 <mark> alright, both squid instances are running on the new text squids
18:09:06 <mark> time to setup statistics so we can see what's happening and we're not missing any requests in our graphs
18:09:15 <mark> both torrus and cricket
18:11:29 <mark> cricket done...
18:14:58 <mark> torrus done as well
18:15:03  * mark watches the graphs to see if they're working
18:15:22 <mark> if not, probably something went wrong earlier with puppet setup or anything
18:17:45 <mark> in the mean time, backend squids are still starting up and reading their COSS partitions (which are empty), which takes a while
18:17:48 <mark> nicely visible in ganglia
18:21:32 <mark> alright, all squids have finished reading their COSS partition, and torrus is showing reasonable values in graphs
18:21:43 <mark> so all squids are correctly configured and ready for service
18:21:50 <mark> but they have EMPTY CACHES
18:22:11 <mark> giving them the full load now, would mean that they would start off with forwarding every request they get onto the backend 
                apaches
18:22:51 <mark> I am going to seed the caches of the backend squids first
18:22:55 <mark> we have a couple of ways of doing that
18:23:19 <mark> first, I'll deploy the *new* squid config (which has all the new backend squids in it) to *one* of the frontend squids on the 
                previously existing servers
18:23:33 <mark> that way that frontend squid will start using the new servers, and filling their caches with the most common requests
18:23:44 <mark> let's use the frontend squid on sq66
18:24:38  * mark runs "./deploy sq66"
18:24:52 <mark> so only sq66 is sending traffic to sq71-78 backend squids now
18:25:02 <mark> which is why they're all using approximately 1% cpu
18:25:31 <mark> now we wait a while and watch the hit rate rise on the new backend squids
18:25:51 <mark> e.g. http://torrus.wikimedia.org/torrus/CDN?path=%2FSquids%2Fsq77.wikimedia.org%2Fbackend%2FPerformance%2FHit_ratios
18:29:26 <mark> no problems visible in the squid logs either
18:32:23 <mark> each of the new squids is serving about 1 Mbps of backend traffic
18:37:10 <mark> the majority of all requests are being forwarded to the backend... let's wait until the hit ratio is a bit higher
18:38:04 <mark> I'll deploy the config to a few more frontend squids so it goes a bit faster
18:54:02 <mark> sq77 is weird in torrus
18:54:10 <mark> it reports 100% request hit ratio and byte hit ratio
18:54:29 <mark> and is still empty in terms of swap..
18:54:33  * mark investigates
18:54:51 <mark> it's not getting traffic
18:58:27 <mark> I think that's just the awful CARP hashing algorithm :(
18:58:31 <mark> it has an extremely low CARP weight
19:04:21  * mark deploys the new squid conf to a few more frontend squids
19:05:29 <mark> they're getting some serious traffic now
19:29:03 <mark> ok
19:29:07 <mark> hit rate is up to roughly 45% now
19:29:30 <mark> swap around 800M per server, and around 60k objects
19:29:40 <mark> I'm confident enough to pool all the new backend squids
19:29:46 <mark> but with a lower CARP weight (10 instead of 30)
19:32:31 <mark> in a few days, when all the newe servers have filled their caches, we can decommision sq40 and lower
19:33:02  * mark watches backend requests graphs and hit ratios
19:40:05 <mark> looks like the site is not bothered at all by the extra load - the seeding worked well
19:40:08 <mark> nice time to get some dinner
19:40:19 <mark> afterwards I'll increase the CARP weight, and pool the frontends
19:40:38 <mark> ...and then repeat that for upload squids
20:19:18 <mark> ok.. the hit ratio is not high enough to my liking, but I can go on with the frontends
20:19:26 <mark> the frontend squids are pretty much independent from the backends
20:19:35 <mark> we need to seed their caches as well, but it's quick
20:20:05 <mark> they don't like to get an instant 2000 requests/s from nothing when we pool them in LVS, so pretty much the only way we can 
                mitigate that is to pool them with low load (1)
20:29:47 <mark> ok, frontend text squids now fully deployed

[edit] Configuration

Configuration is done by editing the master files in /home/wikipedia/conf/squid, then running make to rebuild the configuration files, and ./deploy to deploy them to the remote servers. The configuration files are:

squid.conf.php 
Template file for the cache (backend) instances
frontend.conf.php 
Template file for the frontend instances
text-settings.php 
A settings array which applies to text squids. All elements in this array will become available as variables during execution of squid.conf.php and frontend.conf.php. The settings array can be used to give server-specific configuration.
upload-settings.php 
Same as text-settings.php but for upload squids
common-acls.conf 
ACL directives used by both text and upload frontends. Use this to block clients from all access.
upload-acls.conf 
ACL directives used by upload frontends. Use this for e.g. image referrer regex blocks.
text-acls.conf 
ACL directives used by text frontends. Use this for e.g. remote loader IP blocks.
Configuration.php 
Contains most of the generator code
generate.php 
the script that the makefile runs

The configs are under version control using git.

The deployment script has lots of options. Run it with no arguments to get a summary.

[edit] Changing configuration

# cd /home/w/conf/squid

Edit *-settings.php or *-acls.php

# make

To see the changes in the generated configuration vs what should be already deployed, run:

$ diff -ru deployed/ generated/

If these changes are ok, you can deploy them to all servers, either all at once or a subset of servers, either fast or slowly. See ./deploy -h for all possible options.

# ./deploy all

Using this invocation, the script will copy the newly generated config files into the deployed/ directory, rsync it to the puppetmaster, and then scp them to each server and reload the squid process(es).

# git commit -m "A meaningful commit message"

You should always commit your changes to git to allow for history tracking and rollback.

[edit] Current problems

  • upload squids in tampa are unhappy because ms5 (thumbs) is slow in responding

[edit] Monitoring

You can get some nice stats about the squids by going to http://noc.wikimedia.org/cgi-bin/cachemgr.cgi (user name root, password is in the squid configuration file). The squids are each listed twice in the drop-down, once for front end and once for back-end. Peer Cache stats for the backend is especially handy.

[edit] Debugging

To see HTTP requests sent from Squids to their backend, install ngrep and run e.g.:

# ngrep -W byline port 80 and dst host ms4.wikimedia.org

[edit] HowTo

[edit] Edit ACLs

in /home/w/conf/squid edit text-acls.conf, then run make, then run ./deploy all

[edit] Purge a given external (non WMF) url

in /home/wikipedia/common/php/maintenance do: echo 'http://blahblahblah' | php ./purgeList.php --wiki aawiki

[edit] See also

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox