LVS
Note to visitors from Google: This section documents the installation of LVS on the Wikimedia Apache cluster. It is a mixed cluster of i386 and x86_64 architectures running Fedora Core 3 and 4.
Contents |
Apache pool
Director setup
Dalembert is functioning as an LVS-DR director. Installing a new LVS director is just a matter of
yum install ipvsadm ip addr add 10.0.5.3 dev eth0 cp ~tstarling/lvs/* /usr/local/bin/ screen lvsmon ^AD run-icpagent.sh
Apache setup
When installing new apaches, one has to be careful of the "ARP problem". If you add the LVS virtual IP to an interface of something other the director without setting arp_announce and arp_ignore on all ethernet interfaces, the apache may steal the IP from the director. Presumably icpagent won't be running on the apache so squid would automatically fall back to perlbal, assuming it's running, so it wouldn't be an unmitigated disaster. But it's probably best to avoid trying it out.
Procedure is as follows:
cat /home/config/others/etc/sysctl.conf.local >> /etc/sysctl.conf sysctl -w net.ipv4.conf.eth0.arp_ignore=1 sysctl -w net.ipv4.conf.eth0.arp_announce=2 sysctl -w net.ipv4.conf.eth1.arp_ignore=1 sysctl -w net.ipv4.conf.eth1.arp_announce=2
The last two commands will probably give you an error since eth1 usually doesn't exist, but you may as well run them anyway just in case. Now, I haven't tried this myself yet, but I think it would be sensible to run a test to make sure ARP is configured correctly. 10.0.5.4 is a reserved service IP and should not be used anywhere.
ip addr add 10.0.5.4 dev lo ssh zwinger ping 10.0.5.4
This should give "destination host unreachable". This test could easily be automated and run concurrently in apache setup scripts. If you get a response, fix it before continuing to the next step. This is the scary step.
ip addr del 10.0.5.4 dev lo ip addr add 10.0.5.3 dev lo
Then add it to the apaches node group and restart lvsmon on the director.
lvsmon
Lvsmon is 80 lines of PHP code written by Tim to monitor apaches and configure ipvsadm accordingly. It should be run in a screen, with no arguments. It uses curl to request http://en.wikipedia.org/w/health-check.php . Because it's so short, I'd recommend you read the code if you want to know the details. But here's an important point: it gets a list of apaches from the dsh node group, and then tests them with their unique 10/8 address, not with the VIP. So if you have apache running on a machine but you don't have it set up for LVS rotation, it's important to remove it from the apaches node group, or else intermittent "connection refused" errors will be returned to the user.
If you kill lvsmon, LVS will keep working, it just won't notice apache state changes anymore.
For a copy of the source, click here
Removing apaches
Apaches can be removed from the pool temporarily by simply shutting down apache. Because lvsmon runs in a single thread, checking apaches in turn, it's probably better to remove permanently dead apaches from the apache nodelist.
LVS director list
| Cluster | Director | VIP |
|---|---|---|
| pmtpa apaches | dalembert | 10.0.5.3 |
| upload squids | tingxi (down) and avicenna | 207.142.131.228 |
| yaseo apaches | yf1018 | 211.115.107.161 |
| yaseo squids | yf1018 | 211.115.107.162 |