LVS

From Wikitech
Revision as of 20:49, 1 October 2007 by Mark (Talk | contribs)

Jump to: navigation, search

lvsmon is used as LVS load balancer control script between the Squids and the Apaches. In front of the Squids we're using another script, PyBal.

Contents

Squids

To install an LVS load balancer, on a base Ubuntu install, do:

  1. apt-get install pybal (ignore the warning about the kernel not supporting IPVS)
  2. Set up configuration in /etc/pybal/
  3. Restart PyBal and check whether it is working correctly (tail /var/log/pybal.log)
  4. Bind the LVS ip to the loopback interface:
ip addr add ip/32 dev lo


Apache pool

Director setup

Dalembert is functioning as an LVS-DR director. Installing a new LVS director is just a matter of

yum install ipvsadm
ip addr add 10.0.5.3 dev eth0
cp ~tstarling/lvs/* /usr/local/bin/
screen
lvsmon
^AD
run-icpagent.sh

Apache setup

When installing new apaches, one has to be careful of the "ARP problem". If you add the LVS virtual IP to an interface of something other the director without setting arp_announce and arp_ignore on all ethernet interfaces, the apache may steal the IP from the director.

Procedure is as follows:

cat /home/config/others/etc/sysctl.conf.local >> /etc/sysctl.conf
sysctl -w net.ipv4.conf.eth0.arp_ignore=1
sysctl -w net.ipv4.conf.eth0.arp_announce=2
sysctl -w net.ipv4.conf.eth1.arp_ignore=1
sysctl -w net.ipv4.conf.eth1.arp_announce=2
sysctl -w net.ipv4.conf.eth2.arp_ignore=1
sysctl -w net.ipv4.conf.eth2.arp_announce=2

The last four commands will probably give you an error since eth1 usually doesn't exist, but you may as well run them anyway just in case.

You can check the ARP status with:

sysctl net.ipv4.conf.eth0.arp_ignore
sysctl net.ipv4.conf.eth1.arp_ignore
sysctl net.ipv4.conf.eth2.arp_ignore

lvsmon

Lvsmon is 80 lines of PHP code written by Tim to monitor apaches and configure ipvsadm accordingly. It should be run in a screen, with no arguments. It uses curl to request http://en.wikipedia.org/w/health-check.php . Because it's so short, I'd recommend you read the code if you want to know the details. But here's an important point: it gets a list of apaches from the dsh node group, and then tests them with their unique 10/8 address, not with the VIP. So if you have apache running on a machine but you don't have it set up for LVS rotation, it's important to remove it from the apaches node group, or else intermittent "connection refused" errors will be returned to the user.

If you kill lvsmon, LVS will keep working, it just won't notice apache state changes anymore.

For a copy of the source, click here

Removing apaches

Apaches can be removed from the pool temporarily by simply shutting down apache. Because lvsmon runs in a single thread, checking apaches in turn, it's probably better to remove permanently dead apaches from the apache nodelist.

If a misbehaving realserver is in LVS and for some reason pybal/lvsmon is not removing it, you can remove it by running a command of the following form:

ipvsadm -d -t <VIP>:<PORT> -r <REALSERVER>

e.g.

ipvsadm -d -t 66.230.200.228:80 -r sq1.pmtpa.wmnet

Diagnosing problems

Run ipvsadm -l on the director. Healthy output looks like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  upload.pmtpa.wikimedia.org:h wlc
  -> sq10.pmtpa.wmnet:http        Route   10     5202       5295
  -> sq1.pmtpa.wmnet:http         Route   10     8183       12213
  -> sq4.pmtpa.wmnet:http         Route   10     7824       13360
  -> sq5.pmtpa.wmnet:http         Route   10     7843       12936
  -> sq6.pmtpa.wmnet:http         Route   10     7930       12769
  -> sq8.pmtpa.wmnet:http         Route   10     7955       11010
  -> sq2.pmtpa.wmnet:http         Route   10     7987       13190
  -> sq7.pmtpa.wmnet:http         Route   10     8003       7953

All the servers are getting a decent amount of traffic, there's just normal variation.

If a realserver is refusing connections or doesn't have the VIP configured, it will look like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  upload.pmtpa.wikimedia.org:h wlc
  -> sq10.pmtpa.wmnet:http        Route   10     2          151577
  -> sq1.pmtpa.wmnet:http         Route   10     2497       1014
  -> sq4.pmtpa.wmnet:http         Route   10     2459       1047
  -> sq5.pmtpa.wmnet:http         Route   10     2389       1048
  -> sq6.pmtpa.wmnet:http         Route   10     2429       1123
  -> sq8.pmtpa.wmnet:http         Route   10     2416       1024
  -> sq2.pmtpa.wmnet:http         Route   10     2389       970
  -> sq7.pmtpa.wmnet:http         Route   10     2457       1008

Active connections for the problem server are depressed, inactive connections normal or above normal. This problem must be fixed immediately, because in wlc mode, LVS load balances based on the ActiveConn column, meaning that servers that are down get all the traffic.

LVS director list

Cluster Director VIP
pmtpa apaches dalembert 10.0.5.3
search backend 1 diderot 10.0.5.9
search backend 2 diderot 10.0.5.10
pmtpa text avicenna 66.230.200.100
pmtpa upload alrazi 66.230.200.228
yaseo text yf1018 211.115.107.162
yaseo upload yf1018 211.115.107.163
knams text iris 145.97.39.155
knams upload iris 145.97.39.156
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox