LVS
Contents |
LVS installation
Service IP configuration
In the new BGP setup, service IPs should be installed as /32s bound to the loopback interface. The best way to do that is to install the - now somewhat misleadingly named - package wikimedia-lvs-realserver.
Enter the IP addresses in the dialog as prompted on installation, or edit /etc/default/wikimedia-lvs-realserver and run dpkg-reconfigure wikimedia-lvs-realserver.
Remove unnecessary networking modules
A default Ubuntu install has iptables and xtables modules loaded, which slow LVS operation down somewhat, and are unnecessary. To disable them with the next boot, put a file /etc/modprobe.d/iptables with the following content:
alias ip_tables off alias iptable_filter off alias x_tables off
Install PyBal
# apt-get install pybal
Edit the configuration in /etc/pybal/pybal.conf. This a .INI-format file, with one section for each LVS service, and a [global] section with global settings. The default configuration file should provide hints.
Put a local copy of the conf files in /home/wikipedia/conf/pybal under the hostname; see the README.txt in that directory.
There is more information on Pybal if you need it.
BGP failover and load sharing
Example BGP configuration:
neighbor 10.0.0.210 remote-as 64601 neighbor 10.0.0.210 description "PyBal on lvs3.wikimedia.org" neighbor 10.0.0.210 timers keep-alive 10 hold-time 30 neighbor 10.0.0.210 update-source loopback 1 neighbor 10.0.0.210 prefix-list LVS in neighbor 10.0.0.210 prefix-list none out
Prefix-list LVS:
ip prefix-list LVS: 2 entries
seq 5 permit 208.80.152.0/22 ge 32
seq 10 permit 10.0.0.0/8 ge 32
SSH checking
As the Apache cluster is often suffering from broken disks which break SSH but keep Apache up, I have implemented a RunCommand monitor in PyBal which can periodically run an arbitrary command, and check the server's health by the return code. If the command does not return within a certain timeout, the server is marked down as well.
The RunCommand configuration is in /etc/pybal/pybal.conf:
runcommand.command = /bin/sh runcommand.arguments = [ '/etc/pybal/runcommand/check-apache', server.host ] runcommand.interval = 60 runcommand.timeout = 10
- runcommand.command
- The path to the command which is being run. Since we are using a shell script and PyBal does not invoke a shell by itself, we have to do that explicitly.
- runcommand.arguments
- A (Python) list of command arguments. This list can refer to the monitor's server object, as shown here.
- runcommand.interval
- How often to run the check (seconds).
- runcommand.timeout
- The command timeout; after this amount of seconds the entire process group of the command will be KILLed, and the server is marked down.
Currently we're using the following RunCommand script, in /etc/pybal/runcommand/check-apache:
#!/bin/sh set -e HOST=$1 SSH_USER=pybal-check SSH_OPTIONS="-o PasswordAuthentication=no -o StrictHostKeyChecking=no -o ConnectTimeout=8" # Open an SSH connection to the real-server. The command is overridden by the authorized_keys file. ssh -i /root/.ssh/pybal-check $SSH_OPTIONS $SSH_USER@$HOST true exit 0
The limited ssh accounts on the application servers are managed by the wikimedia-task-appserver package.
Old
To install an LVS load balancer, on a base Ubuntu install, do:
- apt-get install pybal (ignore the warning about the kernel not supporting IPVS)
- Set up configuration in /etc/pybal/
- Restart PyBal and check whether it is working correctly (tail /var/log/pybal.log)
- Bind the LVS ip(s) to the external interface (usually eth0); for persistence after booting add the following line to the loopback interface block in /etc/network/interfaces:
up ip addr add ip/32 dev $IFACE
Removing real-servers
Real-servers can be removed from the pool temporarily by simply shutting down apache. Because lvsmon runs in a single thread, checking apaches in turn, it's probably better to remove permanently dead apaches from the apache nodelist.
If a misbehaving realserver is in LVS and for some reason PyBal is not removing it, you can remove it by running a command of the following form:
ipvsadm -d -t <VIP>:<PORT> -r <REALSERVER>
e.g.
ipvsadm -d -t 66.230.200.228:80 -r sq1.pmtpa.wmnet
Diagnosing problems
Run ipvsadm -l on the director. Healthy output looks like this:
IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP upload.pmtpa.wikimedia.org:h wlc -> sq10.pmtpa.wmnet:http Route 10 5202 5295 -> sq1.pmtpa.wmnet:http Route 10 8183 12213 -> sq4.pmtpa.wmnet:http Route 10 7824 13360 -> sq5.pmtpa.wmnet:http Route 10 7843 12936 -> sq6.pmtpa.wmnet:http Route 10 7930 12769 -> sq8.pmtpa.wmnet:http Route 10 7955 11010 -> sq2.pmtpa.wmnet:http Route 10 7987 13190 -> sq7.pmtpa.wmnet:http Route 10 8003 7953
All the servers are getting a decent amount of traffic, there's just normal variation.
If a realserver is refusing connections or doesn't have the VIP configured, it will look like this:
IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP upload.pmtpa.wikimedia.org:h wlc -> sq10.pmtpa.wmnet:http Route 10 2 151577 -> sq1.pmtpa.wmnet:http Route 10 2497 1014 -> sq4.pmtpa.wmnet:http Route 10 2459 1047 -> sq5.pmtpa.wmnet:http Route 10 2389 1048 -> sq6.pmtpa.wmnet:http Route 10 2429 1123 -> sq8.pmtpa.wmnet:http Route 10 2416 1024 -> sq2.pmtpa.wmnet:http Route 10 2389 970 -> sq7.pmtpa.wmnet:http Route 10 2457 1008
Active connections for the problem server are depressed, inactive connections normal or above normal. This problem must be fixed immediately, because in wlc mode, LVS load balances based on the ActiveConn column, meaning that servers that are down get all the traffic.
LVS director list
| Cluster | Director | VIP |
|---|---|---|
| pmtpa apaches | lvs3 | 10.2.1.1 |
| search backend 1 | lvs3 | 10.2.1.11 |
| search backend 2 | lvs3 | 10.2.1.12 |
| search backend 3 | lvs3 | 10.2.1.13 |
| rendering | lvs3 | 10.2.1.21 |
| pmtpa text | lvs4 | 208.80.152.2 |
| pmtpa upload | lvs2 | 208.80.152.3 |
| esams text | amslvs3 | 91.198.174.232 |
| esams upload | amslvs2 | 91.198.174.234 |
| esams bits | amslvs1 | 91.198.174.233 |
A good way to generate this list is:
dsh -N ALL -f -e 'ipvsadm -l '
and look for the hosts that give you a pile of output. Because most hosts have config files for both text and upload squids, they will pretend to serve for both. You can check what they are really doing by looking at the output.
Example: output like
fuchsia: IP Virtual Server version 1.2.1 (size=1048576) fuchsia: Prot LocalAddress:Port Scheduler Flags fuchsia: -> RemoteAddress:Port Forward Weight ActiveConn InActConn fuchsia: TCP rr.esams.wikimedia.org:www wlc fuchsia: -> knsq6.esams.wikimedia.org:ww Route 10 26707 31425 fuchsia: -> knsq5.esams.wikimedia.org:ww Route 10 26708 31426 fuchsia: -> knsq24.esams.wikimedia.org:w Route 10 26741 31116 ... (more lines with lots of ActiveConn) fuchsia: TCP upload.esams.wikimedia.org:w wlc fuchsia: -> knsq17.esams.wikimedia.org:w Route 10 0 5 fuchsia: -> knsq13.esams.wikimedia.org:w Route 10 0 5 fuchsia: -> knsq19.esams.wikimedia.org:w Route 10 0 5 ... (more lines with 0 ActiveConn)
means that the host is doing lvs for rr.esams.wikimedia.org but not for upload.esams.wikimedia.org.