Squids

From Wikitech
Revision as of 21:25, 11 February 2010 by Mark (Talk | contribs)

Jump to: navigation, search

There are 4 clusters of squid servers, one upload and one text at each of our two locations: esams and pmpta. Each server runs two instances of squid: a frontend squid listening on port 80, and a cache squid listening on port 3128. The purpose of the frontend squid is to distribute load to the cache squids based on URL hash, using the CARP algorithm.

LVS is used to balance incoming requests between the frontends that use CARP to distribute the traffic to the backends.

Contents

Squid cluster distribution

Please note that puppet now handles these parameters, and they are shown on this page for informational purposes only.

text upload
pmtpa sq16-sq40
208.80.152.2
sq1-sq15, sq41-sq50
208.80.152.3
knams knsq1-knsq7, knsq23-30
91.198.174.2
knsq8 - knsq22
91.198.174.3
yaseo yf1000-yf1004
203.212.189.253
yf1005 - yf1009
203.212.189.254

RE-Installation

Please note that NEW squid servers need to be setup by someone who understands the full setup. There are a number of various setttings that have to be configured. Thus the instructions are only for reinstallation.

To reinstall a previously existing squid server:

  • Reinstall the server OS.
  • Follow the instructions on Puppet#Reinstalls
  • Deploy the Squid configuration files on fenari:
# cd /home/w/conf/squid
# ./deploy servername
  • Leave it alone, puppet will configure the packages and settings from there. (Yes, it just got that easy.)

Old reinstallation instructions

To reinstall a previously existing Squid server:

  1. Save the SSH hostkeys
  2. Reinstall the server using PXE as documented in Automated installation
  3. After boot, copy the old SSH hostkey back using scp -o StrictHostKeyChecking=no files hostname:/etc/ssh/
  4. Log in, and set the root password
  5. # apt-get install wikimedia-task-squid (Answer the question about the LVS service IP very carefully)
  6. From zwinger, do: cd /home/w/conf/squid/ && make && ./deploy hostname
  7. (upload squids only) Run # setup-aufs-cachedirs to set up the AUFS cache partition. This will wipe any previous AUFS partition/data!
  8. If the Squid had not been running for a long time, clean the cache with /etc/init.d/squid clean
    1. Anything under ~2 hours and the cache does not need to be cleaned.
  9. # /etc/init.d/squid start
    1. Wait a few minutes after starting the backend here to start the frontend.
    2. You can watch TOP and see when the squid process slows down, or just read /var/log/squid/cache.log
  10. # /etc/init.d/squid-frontend start
  11. Install ganglia gmond

Deploying more Squids

17:53:02  * mark extends the squid configuration (text-settings.php) with config for the new squids
17:57:38  * mark deploys the squid configs to the new squid hosts only, so puppet can do its task. old config remains on the rest of the squids, 
so they're still unaffected
17:57:52 <mark> (I promised Rob to show every step of squid deployment in case anyone's wondering ;)
17:58:23  * mark checks whether MediaWiki is setup to recognize the new squids as proxies (CommonSettings.php)
17:58:55 <mark> yes it is
18:01:42  * mark checks whether puppet has initialized the squids; i.e. both squid instances are running, and the correct LVS ip is bound
18:03:11 <mark> where puppet hasn't run yet since the squid config deploy, I trigger it with "puppetd --test"
18:04:10 <mark> they've all nicely joined ganglia as well
18:08:56 <mark> alright, both squid instances are running on the new text squids
18:09:06 <mark> time to setup statistics so we can see what's happening and we're not missing any requests in our graphs
18:09:15 <mark> both torrus and cricket
18:11:29 <mark> cricket done...
18:14:58 <mark> torrus done as well
18:15:03  * mark watches the graphs to see if they're working
18:15:22 <mark> if not, probably something went wrong earlier with puppet setup or anything
18:17:45 <mark> in the mean time, backend squids are still starting up and reading their COSS partitions (which are empty), which takes a while
18:17:48 <mark> nicely visible in ganglia
18:21:32 <mark> alright, all squids have finished reading their COSS partition, and torrus is showing reasonable values in graphs
18:21:43 <mark> so all squids are correctly configured and ready for service
18:21:50 <mark> but they have EMPTY CACHES
18:22:11 <mark> giving them the full load now, would mean that they would start off with forwarding every request they get onto the backend 
                apaches
18:22:51 <mark> I am going to seed the caches of the backend squids first
18:22:55 <mark> we have a couple of ways of doing that
18:23:19 <mark> first, I'll deploy the *new* squid config (which has all the new backend squids in it) to *one* of the frontend squids on the 
                previously existing servers
18:23:33 <mark> that way that frontend squid will start using the new servers, and filling their caches with the most common requests
18:23:44 <mark> let's use the frontend squid on sq66
18:24:38  * mark runs "./deploy sq66"
18:24:52 <mark> so only sq66 is sending traffic to sq71-78 backend squids now
18:25:02 <mark> which is why they're all using approximately 1% cpu
18:25:31 <mark> now we wait a while and watch the hit rate rise on the new backend squids
18:25:51 <mark> e.g. http://torrus.wikimedia.org/torrus/CDN?path=%2FSquids%2Fsq77.wikimedia.org%2Fbackend%2FPerformance%2FHit_ratios
18:29:26 <mark> no problems visible in the squid logs either
18:32:23 <mark> each of the new squids is serving about 1 Mbps of backend traffic
18:37:10 <mark> the majority of all requests are being forwarded to the backend... let's wait until the hit ratio is a bit higher
18:38:04 <mark> I'll deploy the config to a few more frontend squids so it goes a bit faster
18:54:02 <mark> sq77 is weird in torrus
18:54:10 <mark> it reports 100% request hit ratio and byte hit ratio
18:54:29 <mark> and is still empty in terms of swap..
18:54:33  * mark investigates
18:54:51 <mark> it's not getting traffic
18:58:27 <mark> I think that's just the awful CARP hashing algorithm :(
18:58:31 <mark> it has an extremely low CARP weight
19:04:21  * mark deploys the new squid conf to a few more frontend squids
19:05:29 <mark> they're getting some serious traffic now
19:29:03 <mark> ok
19:29:07 <mark> hit rate is up to roughly 45% now
19:29:30 <mark> swap around 800M per server, and around 60k objects
19:29:40 <mark> I'm confident enough to pool all the new backend squids
19:29:46 <mark> but with a lower CARP weight (10 instead of 30)
19:32:31 <mark> in a few days, when all the newe servers have filled their caches, we can decommision sq40 and lower
19:33:02  * mark watches backend requests graphs and hit ratios
19:40:05 <mark> looks like the site is not bothered at all by the extra load - the seeding worked well
19:40:08 <mark> nice time to get some dinner
19:40:19 <mark> afterwards I'll increase the CARP weight, and pool the frontends
19:40:38 <mark> ...and then repeat that for upload squids
20:19:18 <mark> ok.. the hit ratio is not high enough to my liking, but I can go on with the frontends
20:19:26 <mark> the frontend squids are pretty much independent from the backends
20:19:35 <mark> we need to seed their caches as well, but it's quick
20:20:05 <mark> they don't like to get an instant 2000 requests/s from nothing when we pool them in LVS, so pretty much the only way we can 
                mitigate that is to pool them with low load (1)
20:29:47 <mark> ok, frontend text squids now fully deployed

Configuration

Configuration is done by editing the master files in /home/wikipedia/conf/squid, then running make to rebuild the configuration files, and ./deploy to deploy them to the remote servers. The configuration files are:

squid.conf.php 
Template file for the cache (backend) instances
frontend.conf.php 
Template file for the frontend instances
text-settings.php 
A settings array which applies to text squids. All elements in this array will become available as variables during execution of squid.conf.php and frontend.conf.php. The settings array can be used to give server-specific configuration.
upload-settings.php 
Same as text-settings.php but for upload squids
common-acls.conf 
ACL directives used by both text and upload frontends. Use this to block clients from all access.
upload-acls.conf 
ACL directives used by upload frontends. Use this for e.g. image referrer regex blocks.
text-acls.conf 
ACL directives used by text frontends. Use this for e.g. remote loader IP blocks.
Configuration.php 
Contains most of the generator code
generate.php 
the script that the makefile runs

Feel free to check in your changes to RCS.

The deployment script has lots of options. Run it with no arguments to get a summary.

Current problems

(none)

See also

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox