LVS

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(lvsmon / pybal)
(noc changes)
 
(76 intermediate revisions by 8 users not shown)
Line 1: Line 1:
'''lvsmon''' is used as LVS load balancer control script between the Squids and the Apaches. In front of the Squids we're using another script, [[PyBal]].
+
Wikimedia uses [http://en.wikipedia.org/wiki/Linux_Virtual_Server LVS] for balancing traffic over multiple servers.
  
==Apache pool==
+
== Overview ==
 +
[[Image:Esams LVS.png|thumb|right|400px]]
  
===Director setup===
+
We use [http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-DR.html LVS-DR], or ''Direct Routing''. This means that only forward (incoming) traffic is balanced by the load balancer, and return traffic does not even go through the load balancer. Essentially, the LVS balancer receives traffic for a given service IP and port, selects one out of multiple "real servers", and then forwards the packet to that real server with only a modified destination MAC address. The destination servers also listen to and accept traffic for the service IP, but don't advertise it over ARP. Return traffic is simply sent directly to the gateway/router.
  
Dalembert is functioning as an LVS-DR director. Installing a new LVS director is just a matter of
+
The LVS balancer and the real servers need to be in the same subnet for this to work.
yum install ipvsadm
+
ip addr add 10.0.5.3 dev eth0
+
cp ~tstarling/lvs/* /usr/local/bin/
+
screen
+
lvsmon
+
^AD
+
run-icpagent.sh
+
  
===Apache setup===
+
The real servers are monitored by a Python program called [[Pybal]]. It does certain kinds of health checks to determine which servers can be used, and pools and depools them accordingly. You can follow what Pybal is doing in log file <tt>/var/log/pybal.log</tt>.
  
When installing new apaches, one has to be careful of the "ARP problem". If you add the LVS virtual IP to an interface of something other the director without setting arp_announce and arp_ignore on all ethernet interfaces, the apache may steal the IP from the director. Presumably icpagent won't be running on the apache so squid would automatically fall back to perlbal, assuming it's running, so it wouldn't be an unmitigated disaster. But it's probably best to avoid trying it out.
+
PyBal also has an integrated [[BGP]] module that Mark has written (Twisted BGP, available in the MediaWiki SVN repository). This is used as a failover/high availability protocol between the LVS balancers (PyBal) and the routers. PyBal announces the LVS service IPs to the router(s) to indicate that it is alive and can serve traffic. This also removes the need to manually configure the service IPs on the active balancers. All LVS servers are now using this setup.
  
Procedure is as follows:
+
== HOWTO ==
 +
=== Pool or depool hosts ===
 +
Edit the files in <tt>/home/w/conf/pybal/''sitename''/</tt> and wait a minute - PyBal will fetch the file over HTTP.
  
cat /home/config/others/etc/sysctl.conf.local >> /etc/sysctl.conf
+
If you set a host do ''disabled'', PyBal will continue to monitor it but just not pool it:
  sysctl -w net.ipv4.conf.eth0.arp_ignore=1
+
  { 'host': 'knsq1.esams.wikimedia.org', 'weight': 10, 'enabled': False }
sysctl -w net.ipv4.conf.eth0.arp_announce=2
+
sysctl -w net.ipv4.conf.eth1.arp_ignore=1
+
sysctl -w net.ipv4.conf.eth1.arp_announce=2
+
  
The last two commands will probably give you an error since eth1 usually doesn't exist, but you may as well run them anyway just in case. Now, I haven't tried this myself yet, but I think it would be sensible to run a test to make sure ARP is configured correctly. 10.0.5.4 is a reserved [[service IP]] and should not be used anywhere.  
+
If you comment the line, PyBal will forget about it completely.
  
  ip addr add 10.0.5.4 dev lo
+
In emergency cases, you can do this manually using <tt>ipvsadm</tt>, if PyBal for some reason is not working for example.
ssh zwinger ping 10.0.5.4
+
  ipvsadm -d -t ''VIP'':''PORT'' -r ''REALSERVER''
 +
Such as:
 +
ipvsadm -d -t 91.198.174.232:80 -r knsq1.esams.wikimedia.org
  
This should give "destination host unreachable". This test could easily be automated and run concurrently in apache setup scripts. If you get a response, fix it before continuing to the next step. This is the scary step.
+
Note that PyBal won't know about this, so make sure you bring the situation back in sync.
  
ip addr del 10.0.5.4 dev lo
+
=== See which LVS balancer is active for a given service ===
ip addr add 10.0.5.3 dev lo
+
  
Then add it to the apaches node group and restart lvsmon on the director.
+
If you have ssh access to the host in question, sshing to the IP address will land you in a shell on whichever system is active.
  
===lvsmon===
+
  $ ssh root@ms-fe.pmtpa.wmnet
 +
  root@lvs4:~#
  
Lvsmon is 80 lines of PHP code written by Tim to monitor apaches and configure ipvsadm accordingly. It should be run in a screen, with no arguments. It uses curl to request http://en.wikipedia.org/w/health-check.php . Because it's so short, I'd recommend you read the code if you want to know the details. But here's an important point: it gets a list of apaches from the dsh node group, and then tests them with their unique 10/8 address, not with the VIP. So if you have apache running on a machine but you don't have it set up for LVS rotation, it's important to remove it from the apaches node group, or else intermittent "connection refused" errors will be returned to the user.
+
If you don't want to connect (or can't connect) to the system, ask the directly attached routers. You can request the route for a given service IP. E.g. on Foundry:
  
If you kill lvsmon, LVS will keep working, it just won't notice apache state changes anymore.
+
<pre>
 +
csw1-esams#show ip route 91.198.174.234
 +
Type Codes - B:BGP D:Connected I:ISIS S:Static R:RIP O:OSPF; Cost - Dist/Metric
 +
Uptime - Days:Hours:Minutes:Seconds
 +
        Destination        Gateway        Port        Cost    Type Uptime
 +
1      91.198.174.234/32  91.198.174.110  ve 1        20/1    B    10:14:28:44
 +
</pre>
 +
 
 +
So 91.198.174.110 (amslvs2] is active for Upload LVS service IP 91.198.174.234.
 +
 
 +
On Juniper:
 +
 
 +
<pre>
 +
csw2-esams> show route 91.198.174.232
 +
 
 +
inet.0: 38 destinations, 41 routes (38 active, 0 holddown, 0 hidden)
 +
+ = Active Route, - = Last Active, * = Both
 +
 
 +
91.198.174.232/32  *[BGP/170] 19:38:18, localpref 100, from 91.198.174.247
 +
                      AS path: 64600 I
 +
                    > to 91.198.174.109 via vlan.100
 +
                    [BGP/170] 1w3d 14:24:52, MED 10, localpref 100
 +
                      AS path: 64600 I
 +
                    > to 91.198.174.111 via vlan.100
 +
</pre>
 +
So 91.198.174.109 (*) is active for Text LVS service IP 91.198.174.232.
 +
 
 +
=== To see all LVS servers configured for a service ===
 +
To see which servers are configured for a service, but not which server is currently active, look in the puppet configs.
 +
* configuration is stored in puppet/manifests/lvs.pp
 +
* in the <code>class lvs::configuration</code>
 +
* in the <code>$lvs_services</code> variable,
 +
* look for your service (eg 'swift' or 'upload')
 +
* look for the <code>class</code>, which will be something like <code>low-traffic</code>
 +
* back up in lvs.pp to the section defining the <code>$lvs_class_hosts</code> variable.
 +
* look for your class (eg <code>low-traffic</code>)
 +
* you should see sections for production and labs, with variables for each data center listing the lvs servers responsible.
 +
 
 +
example showing that lvs3 and 4 are responsible for swift (command output excerpted for clarity):
 +
  ben@fenari:~/pp/manifests$ grep -A2 "swift" lvs.pp
 +
                "swift" => {
 +
                        'description' => "Swift object store for thumbnails",
 +
                        'class' => "low-traffic",
 +
  ben@fenari:~/pp/manifests$ grep -A5 "low-traffic" lvs.pp  | head -n 5
 +
                'low-traffic' => $::realm ? {
 +
                        'production' => $::site ? {
 +
                                'pmtpa' => [ "lvs3", "lvs4" ],
 +
                                'eqiad' => [ "lvs1003", "lvs1006" ],
 +
                                'esams' => [ ],
 +
 
 +
=== Deploy a change to an existing service ===
 +
Preconditions:
 +
* you have already made the change in puppet and pushed the change to sockpuppet.
 +
* you have tested the change on the backend real servers directly (eg if you were changing a health check URL you have already queried the backend servers for that URL successfully).
 +
 
 +
Deploy steps:
 +
* find out which LVS servers host your service (see above).  For this example, I'll use lvs 3 and 4.
 +
* find out which LVS server is active (see above).  For this example, I'll assume lvs4.
 +
* log into the inactive host twice.
 +
* in one session, tail the pybal log looking for one (or more) of your backend servers.  eg <code>tail -f /var/log/pybal.log | grep ms-fe</code>
 +
** you should see one line per server every 10 seconds or so. Look for "enabled/up/pooled". eg:
 +
  2012-03-13 19:12:20.003332 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.006 s
 +
* in the other session, run puppet and verify your change exists on the local filesystem
 +
* get a list of all IP addresses served by this LVS server - you're going to check that they all exist after your change
 +
** run <code>ip addr</code> and save the output for later
 +
* restart pybal
 +
** pybal doesn't die correctly, so a restart involves:
 +
** /etc/init.d/pybal stop
 +
** pkill -9 pybal
 +
** /etc/init.d/pybal start
 +
* check that all the expected IP addresses exist
 +
** run 'ip addr' and compare against the list you collected before making your change
 +
* in the log you're tailing, you should see a few messages like:
 +
  2012-03-13 19:21:26.015393 New enabled server ms-fe1.pmtpa.wmnet, weight 40
 +
  2012-03-13 19:21:26.015611 New enabled server ms-fe2.pmtpa.wmnet, weight 40
 +
  2012-03-13 19:21:26.015666 ['-a -t 10.2.1.27:80 -r ms-fe2.pmtpa.wmnet -w 40', '-a -t 10.2.1.27:80 -r ms-fe1.pmtpa.wmnet -w 40']
 +
  2012-03-13 19:21:26.074253 [IdleConnection] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Connection established.
 +
  2012-03-13 19:21:26.075302 [IdleConnection] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Connection established.
 +
  2012-03-13 19:21:36.022517 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.007 s
 +
  2012-03-13 19:21:36.023442 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.008 s
 +
  2012-03-13 19:21:46.030482 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.008 s
 +
* Look for "Fetch successful". This is the line that means your change is successful. Seeing 'enabled/up/pooled' is insufficient.
 +
** example of a failed change (note that it still says enabled/up/pooled for a few lines - this does '''not''' mean that it's ok!  look for the Fetch line):
 +
  2012-03-13 19:27:23.626787 [IdleConnection] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Connection established.
 +
  2012-03-13 19:27:23.632928 [IdleConnection] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Connection established.
 +
  2012-03-13 19:27:33.555879 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch failed, 0.005 s
 +
  2012-03-13 19:27:33.555917 Monitoring instance ProxyFetch reports servers ms-fe2.pmtpa.wmnet (enabled/up/pooled) down: 404 Not Found
 +
  2012-03-13 19:27:33.556022 ['-d -t 10.2.1.27:80 -r ms-fe2.pmtpa.wmnet']
 +
  2012-03-13 19:27:33.562458 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch failed, 0.012 s
 +
  2012-03-13 19:27:33.562533 Monitoring instance ProxyFetch reports servers ms-fe1.pmtpa.wmnet (enabled/up/pooled) down: 404 Not Found
 +
  2012-03-13 19:27:33.562589 Could not depool server ms-fe1.pmtpa.wmnet because of too many down!
 +
  2012-03-13 19:27:43.561745 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/partially up/not pooled): Fetch failed, 0.002 s
 +
  2012-03-13 19:27:43.565608 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/partially up/pooled): Fetch failed, 0.003 s
 +
* if your change was successful, repeat the procedure on the active host.
 +
** when you stop pybal on the active host, traffic will immediately fail over to the standby host.
 +
** when you restart pybal on the formerly active host, traffic will immediately fail back (the LVS pairs are configured with a default and a standby so traffic always flows to the default if it's up).
 +
 
 +
=== Add a new load balanced service ===
 +
First choose whether it's a 'high-traffic' (aka public-facing) or 'low-traffic' (aka internal post-cache)
 +
==== DNS changes ====
 +
* allocate an IP address per colo to serve your content
 +
* internal addresses should have names *.svc.$colo.wmnet:
 +
** [[pmtpa]] should be in the 10.2.1.0/24 range
 +
** [[eqiad]] should be in the 10.2.2.0/24 range
 +
* external addresses:
 +
** These need to be allocated from the (small!) public IP address pool, and may need specific configuration on the routers. Talk to the network admins first (Mark/Leslie).
 +
 
 +
==== Puppet Changes ====
 +
* in manifests/lvs.pp
 +
** add a stanza to $lvs_service_ips defining the IP addresses mimicking an existing entry
 +
** add a stanza to $lvs_services mimicking an existing entry
 +
*** use your choice of high or low volume traffic here
 +
* in manifests/site.pp
 +
** add the IP address to $lvs_balancer_ips in the node definition for the lvs servers that will serve your service
 +
** add the IP address and an include for lvs::realserver to your node definition
 +
  $lvs_realserver_ips = [ "10.2.1.xx" ]
 +
  include lvs::realserver
 +
 
 +
==== noc changes ====
 +
* create a file in /home/w/conf/pybal/$colo/ for your service listing which hosts are available for the backend
 +
* see [[PyBal]] for more detail on how this part works
 +
 
 +
==== Deploy your changes ====
 +
* see instructions above (Deploy a change to an existing service)
 +
 
 +
== LVS installation ==
 +
LVS now uses ''[[Puppet]]'' and ''automatic BGP failover''. Puppet arranges the service IP configuration, and installation of packages. To configure the service IPs that an LVS balancer should serve (both primary and backup!), set the <tt>$lvs_balancer_ips</tt> variable:
 +
 
 +
<pre>
 +
node /amslvs[1-4]\.esams\.wikimedia\.org/ {
 +
        $cluster = "misc_esams"
 +
 
 +
        $lvs_balancer_ips = [ "91.198.174.2", "91.198.174.232", "91.198.174.233", "91.198.174.234" ]
 +
 
 +
        include base,
 +
                ganglia,
 +
                lvs::balancer
 +
}
 +
</pre>
 +
 
 +
In this setup, all 4 hosts amslvs1-amslvs4 are configured to accept all service IPs, although in practice every service IP is only ever serviced by one out of two hosts due to the router configuration.
 +
 
 +
Puppet uses the (now misleadingly named) <tt>wikimedia-lvs-realserver</tt> package to bind these IPs to the ''loopback'' (!) interface. This is to make sure that a server ''answers'' on these IPs, but does not announce them via ARP - we'll use BGP for that.
 +
 
 +
=== LVS service configuration ===
 +
In file <tt>lvs.pp</tt> the services themselves are configured, from which the PyBal configuration file <tt>/etc/pybal/pybal.conf</tt> is generated by Puppet.
 +
 
 +
Most configuration is in a large associative hash, <tt>$lvs_services</tt>. Each key in this hash is the name of one LVS service, and points to hash of PyBal configuration variables:
 +
; description : Textual description of the LVS service.
 +
; class : The ''class'' the LVS service belongs too; i.e. on which LVS balancers it is active (see below).
 +
; ip : A hash of service IP address for the service. All IP addresses are aliases, and are translated to separate LVS services in PyBal.conf, but with identical configuration.
 +
The other configuration variables are described in the [[PyBal]] article.
 +
 
 +
Global PyBal configuration options can be specified in the <tt>$pybal</tt> hash.
 +
 
 +
==== Classes ====
 +
 
 +
To determine which LVS services are active on which hosts, the <tt>$lvs_class_hosts</tt> determines for each class, which hosts should have the services for that class. This is used by the pybal.conf template to generate the LVS services. The following classes are used, to distribute traffic over the LVS balancer hosts:
 +
 
 +
* high-traffic1 (text, bits)
 +
* high-traffic2 (text, upload)
 +
* https (HTTPS services corresponding to the 'high-traffic' HTTP services; should be active on all hosts that carry either class)
 +
* specials (special LVS services, especially those that do not have BGP enabled)
 +
* low-traffic (internal load balancing, e.g. from the Squids to the Apaches)
 +
 
 +
=== BGP failover and load sharing ===
 +
Previously, the LVS balancer that had a certain service IP bound to its <tt>eth0</tt> interface was active for that IP. To do failovers, the IP had to be moved manually.
 +
 
 +
In the new setup, multiple servers announce the service IP(s) via BGP to the router(s), which then pick which server(s) to use based on BGP routing policy.
 +
 
 +
==== PyBal BGP configuration ====
 +
In the global section, the following BGP related settings typically exist:
 +
bgp = yes
 +
 
 +
Enables bgp globally, but can be overridden per service.
 +
 
 +
bgp-local-asn =  64600
 +
 
 +
The ASN to use while communicating to the routers. All prefixes will get this ASN as AS path.
 +
 
 +
bgp-peer-address = 91.198.174.247
 +
 
 +
The IP of the router this PyBal instance speaks BGP to.
 +
 
 +
#bgp-as-path = 64600 64601
 +
 
 +
An optional modified AS path. Can be used e.g. to make the AS path longer and thus less attractive (on a backup balancer).
 +
 
 +
==== Example BGP configuration for Foundry ====
 +
<pre>
 +
router bgp
 +
neighbor 91.198.174.109 remote-as 64600
 +
neighbor 91.198.174.109 description "PyBal on amslvs1"
 +
neighbor 91.198.174.109 timers  keep-alive 10  hold-time 30
 +
neighbor 91.198.174.109 update-source loopback 1
 +
neighbor 91.198.174.110 remote-as 64600
 +
neighbor 91.198.174.110 description "PyBal on amslvs2"
 +
neighbor 91.198.174.110 timers  keep-alive 10  hold-time 30
 +
neighbor 91.198.174.110 update-source loopback 1
 +
 
 +
neighbor 91.198.174.244 description "iBGP to csw2-esams"
 +
neighbor 91.198.174.244 timers  keep-alive 10  hold-time 30
 +
neighbor 91.198.174.244 update-source loopback 1
 +
 
 +
 
 +
neighbor 91.198.174.109 prefix-list LVS in
 +
neighbor 91.198.174.109 prefix-list none out
 +
neighbor 91.198.174.110 maximum-prefix 10 teardown
 +
neighbor 91.198.174.110 prefix-list LVS in                     
 +
neighbor 91.198.174.110 prefix-list none out
 +
 
 +
neighbor 91.198.174.244 maximum-prefix 10 teardown
 +
neighbor 91.198.174.244 prefix-list LVS in
 +
neighbor 91.198.174.244 prefix-list LVS out
 +
neighbor 91.198.174.244 unsuppress-map LVS-IBGP-EXCHANGE
 +
!
 +
 
 +
ip prefix-list  LVS seq 5 permit 91.198.174.0/25 ge 32
 +
ip prefix-list  LVS seq 10 permit 91.198.174.232/30 ge 32
 +
 
 +
 
 +
route-map  LVS-IBGP-EXCHANGE permit  10
 +
match ip address prefix-list LVS
 +
route-map  LVS-IBGP-EXCHANGE deny  100
 +
</pre>
 +
 
 +
==== Example BGP configuration for Juniper ([[csw2-esams]]) ====
 +
<pre>
 +
root@csw2-esams> show configuration protocols bgp   
 +
group PyBal {
 +
    type external;
 +
    multihop {
 +
        ttl 1;
 +
    }
 +
    local-address 91.198.174.244;
 +
    hold-time 30;
 +
    import LVS_import;
 +
    family inet {
 +
        unicast {
 +
            prefix-limit {
 +
                maximum 10;
 +
                teardown;
 +
            }
 +
        }
 +
    }
 +
    export NONE;
 +
    peer-as 64600;
 +
    neighbor 91.198.174.111;
 +
    neighbor 91.198.174.112;
 +
}
 +
group iBGP {
 +
    type internal;
 +
    peer-as 43821;
 +
    neighbor 91.198.174.247 {
 +
        import LVS_exchange;
 +
        export LVS_exchange;
 +
    }
 +
}
 +
 
 +
root@csw2-esams> show configuration policy-options
 +
prefix-list LVS {
 +
    91.198.174.0/25;
 +
    91.198.174.232/30;
 +
}
 +
policy-statement LVS_exchange {
 +
    term 1 {
 +
        from {
 +
            prefix-list-filter LVS longer;
 +
        }
 +
        then accept;
 +
    }
 +
    from protocol bgp;
 +
}
 +
policy-statement LVS_import {
 +
    term 1 {
 +
        from {
 +
            protocol bgp;
 +
            prefix-list-filter LVS longer;
 +
        }
 +
        then {
 +
            metric add 10;
 +
            accept;
 +
        }
 +
    }
 +
}
 +
</pre>
 +
 
 +
The ''LVS_import'' policy adds metric 10 to the "routes" (service IPs) received from the ''secondary'' (backup) LVS balancers. This means that the router will regard them as less preferred.
 +
 
 +
At esams, Foundry router [[csw1-esams]] and JUNOS router [[csw2-esams]] exchange the service IPs over iBGP.
 +
 
 +
=== SSH checking ===
 +
As the Apache cluster is often suffering from broken disks which break SSH but keep Apache up, I have implemented a ''RunCommand'' monitor in PyBal which can periodically run an arbitrary command, and check the server's health by the return code. If the command does not return within a certain timeout, the server is marked ''down'' as well.
 +
 
 +
The ''RunCommand'' configuration is in <tt>/etc/pybal/pybal.conf</tt>:
 +
<pre>
 +
runcommand.command = /bin/sh
 +
runcommand.arguments = [ '/etc/pybal/runcommand/check-apache', server.host ]
 +
runcommand.interval = 60
 +
runcommand.timeout = 10
 +
</pre>
 +
 
 +
; runcommand.command : The path to the command which is being run. Since we are using a shell script and PyBal does not invoke a shell by itself, we have to do that explicitly.
 +
; runcommand.arguments : A (Python) list of command arguments. This list can refer to the monitor's ''server'' object, as shown here.
 +
; runcommand.interval : How often to run the check (seconds).
 +
; runcommand.timeout : The command timeout; after this amount of seconds the entire process group of the command will be KILLed, and the server is marked down.
 +
 
 +
Currently we're using the following RunCommand script, in <tt>/etc/pybal/runcommand/check-apache</tt>:
 +
<pre>
 +
#!/bin/sh
 +
 
 +
set -e
 +
 
 +
HOST=$1
 +
SSH_USER=pybal-check
 +
SSH_OPTIONS="-o PasswordAuthentication=no -o StrictHostKeyChecking=no -o ConnectTimeout=8"
 +
 
 +
# Open an SSH connection to the real-server. The command is overridden by the authorized_keys file.
 +
ssh -i /root/.ssh/pybal-check $SSH_OPTIONS $SSH_USER@$HOST true
 +
 
 +
exit 0
 +
</pre>
  
[[lvsmon|For a copy of the source, click here]]
+
The limited ssh accounts on the application servers are managed by the <tt>wikimedia-task-appserver</tt> package.
  
===Removing apaches===
+
=== Monitoring ===
 +
Nagios monitoring of LVS services is managed by Puppet as well, at the bottom of the <tt>lvs.pp</tt> file. For example:
  
Apaches can be removed from the pool temporarily by simply shutting down apache. Because lvsmon runs in a single thread, checking apaches in turn, it's probably better to remove permanently dead apaches from the apache nodelist.
+
monitor_service_lvs_http { "wikimedia-lb.pmtpa.wikimedia.org":
 +
    ip_address => "208.80.152.200",
 +
    check_command => "check_http_lvs!meta.wikimedia.org!/wiki/Main_Page"
 +
}
 +
monitor_service_lvs_https { "wikimedia-lb.pmtpa.wikimedia.org":
 +
    ip_address => "208.80.152.200",
 +
    check_command => "check_https_url!meta.wikimedia.org!/wiki/Main_Page"
 +
}
  
===Diagnosing problems===
+
==Diagnosing problems==
  
 
Run <tt>ipvsadm -l</tt> on the director. Healthy output looks like this:
 
Run <tt>ipvsadm -l</tt> on the director. Healthy output looks like this:
Line 90: Line 414:
 
Active connections for the problem server are depressed, inactive connections normal or above normal. This problem must be fixed immediately, because in wlc mode, LVS load balances based on the ActiveConn column, meaning that servers that are down get all the traffic.
 
Active connections for the problem server are depressed, inactive connections normal or above normal. This problem must be fixed immediately, because in wlc mode, LVS load balances based on the ActiveConn column, meaning that servers that are down get all the traffic.
  
==LVS director list==
+
=== Incorrectly bound interfaces ===
 +
Don't ever bind IP addresses directly to lo in /etc/network/interfaces.  If you do, stuff breaks.  (This applies not just to LVS servers but any real server as well.  Anything with the wikimedia-lvs-realserver package will break if you bind addresses manually.)
 +
 
 +
When it's broken, it looks like this.  Notice that all the balanced IP addresses are tagged lo:LVS except 10.2.1.13.  13 is broken and causes the ifup script that reloads the IPs to be broken. 
 +
<pre>
 +
    inet 127.0.0.1/8 scope host lo
 +
    inet 10.2.1.13/32 scope global lo
 +
    inet 10.2.1.1/32 scope global lo:LVS
 +
    inet 10.2.1.11/32 scope global lo:LVS
 +
    inet 10.2.1.12/32 scope global lo:LVS
 +
    inet6 ::1/128 scope host
 +
</pre>
 +
 
 +
The solution here is to delete the broken interface (<code>ip addr del 10.2.1.13/32 dev lo</code>), then run <code>dpkg-reconfigure wikimedia-lvs-realserver</code>.  This triggers the scripts that will re-add all the IP addresses.
 +
 
 +
Happy <code>ip addr</code> output looks like this:
 +
<pre>
 +
root@lvs4:/etc/network# ip addr
 +
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
 +
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 +
    inet 127.0.0.1/8 scope host lo
 +
    inet 10.2.1.1/32 scope global lo:LVS
 +
    inet 10.2.1.11/32 scope global lo:LVS
 +
    inet 10.2.1.12/32 scope global lo:LVS
 +
    inet 10.2.1.13/32 scope global lo:LVS
 +
    inet 10.2.1.21/32 scope global lo:LVS
 +
    inet 10.2.1.22/32 scope global lo:LVS
 +
    inet 10.2.1.27/32 scope global lo:LVS
 +
    inet6 ::1/128 scope host
 +
      valid_lft forever preferred_lft forever
 +
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
 +
    etc...
 +
</pre>
  
{| border=1
+
[[Category:Network]]
!  Cluster              || Director                    || VIP
+
|-------------------------------------------------------------------------
+
| pmtpa apaches        || dalembert                    || 10.0.5.3
+
|-------------------------------------------------------------------------
+
| upload squids        || avicenna                    || 207.142.131.228
+
|-------------------------------------------------------------------------
+
| yaseo apaches        || yf1018                      || 211.115.107.161
+
|-------------------------------------------------------------------------
+
| yaseo squids          || yf1018                      || 211.115.107.162
+
|}
+

Latest revision as of 23:21, 10 July 2012

Wikimedia uses LVS for balancing traffic over multiple servers.

Contents

[edit] Overview

Esams LVS.png

We use LVS-DR, or Direct Routing. This means that only forward (incoming) traffic is balanced by the load balancer, and return traffic does not even go through the load balancer. Essentially, the LVS balancer receives traffic for a given service IP and port, selects one out of multiple "real servers", and then forwards the packet to that real server with only a modified destination MAC address. The destination servers also listen to and accept traffic for the service IP, but don't advertise it over ARP. Return traffic is simply sent directly to the gateway/router.

The LVS balancer and the real servers need to be in the same subnet for this to work.

The real servers are monitored by a Python program called Pybal. It does certain kinds of health checks to determine which servers can be used, and pools and depools them accordingly. You can follow what Pybal is doing in log file /var/log/pybal.log.

PyBal also has an integrated BGP module that Mark has written (Twisted BGP, available in the MediaWiki SVN repository). This is used as a failover/high availability protocol between the LVS balancers (PyBal) and the routers. PyBal announces the LVS service IPs to the router(s) to indicate that it is alive and can serve traffic. This also removes the need to manually configure the service IPs on the active balancers. All LVS servers are now using this setup.

[edit] HOWTO

[edit] Pool or depool hosts

Edit the files in /home/w/conf/pybal/sitename/ and wait a minute - PyBal will fetch the file over HTTP.

If you set a host do disabled, PyBal will continue to monitor it but just not pool it:

{ 'host': 'knsq1.esams.wikimedia.org', 'weight': 10, 'enabled': False }

If you comment the line, PyBal will forget about it completely.

In emergency cases, you can do this manually using ipvsadm, if PyBal for some reason is not working for example.

ipvsadm -d -t VIP:PORT -r REALSERVER

Such as:

ipvsadm -d -t 91.198.174.232:80 -r knsq1.esams.wikimedia.org

Note that PyBal won't know about this, so make sure you bring the situation back in sync.

[edit] See which LVS balancer is active for a given service

If you have ssh access to the host in question, sshing to the IP address will land you in a shell on whichever system is active.

 $ ssh root@ms-fe.pmtpa.wmnet
 root@lvs4:~#

If you don't want to connect (or can't connect) to the system, ask the directly attached routers. You can request the route for a given service IP. E.g. on Foundry:

csw1-esams#show ip route 91.198.174.234
Type Codes - B:BGP D:Connected I:ISIS S:Static R:RIP O:OSPF; Cost - Dist/Metric
Uptime - Days:Hours:Minutes:Seconds 
        Destination        Gateway         Port        Cost     Type Uptime
1       91.198.174.234/32  91.198.174.110  ve 1        20/1     B    10:14:28:44

So 91.198.174.110 (amslvs2] is active for Upload LVS service IP 91.198.174.234.

On Juniper:

csw2-esams> show route 91.198.174.232 

inet.0: 38 destinations, 41 routes (38 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

91.198.174.232/32  *[BGP/170] 19:38:18, localpref 100, from 91.198.174.247
                      AS path: 64600 I
                    > to 91.198.174.109 via vlan.100
                    [BGP/170] 1w3d 14:24:52, MED 10, localpref 100
                      AS path: 64600 I
                    > to 91.198.174.111 via vlan.100

So 91.198.174.109 (*) is active for Text LVS service IP 91.198.174.232.

[edit] To see all LVS servers configured for a service

To see which servers are configured for a service, but not which server is currently active, look in the puppet configs.

  • configuration is stored in puppet/manifests/lvs.pp
  • in the class lvs::configuration
  • in the $lvs_services variable,
  • look for your service (eg 'swift' or 'upload')
  • look for the class, which will be something like low-traffic
  • back up in lvs.pp to the section defining the $lvs_class_hosts variable.
  • look for your class (eg low-traffic)
  • you should see sections for production and labs, with variables for each data center listing the lvs servers responsible.

example showing that lvs3 and 4 are responsible for swift (command output excerpted for clarity):

 ben@fenari:~/pp/manifests$ grep -A2 "swift" lvs.pp 
               "swift" => {
                       'description' => "Swift object store for thumbnails",
                       'class' => "low-traffic",
 ben@fenari:~/pp/manifests$ grep -A5 "low-traffic" lvs.pp  | head -n 5
               'low-traffic' => $::realm ? {
                       'production' => $::site ? {
                               'pmtpa' => [ "lvs3", "lvs4" ],
                               'eqiad' => [ "lvs1003", "lvs1006" ],
                               'esams' => [ ],

[edit] Deploy a change to an existing service

Preconditions:

  • you have already made the change in puppet and pushed the change to sockpuppet.
  • you have tested the change on the backend real servers directly (eg if you were changing a health check URL you have already queried the backend servers for that URL successfully).

Deploy steps:

  • find out which LVS servers host your service (see above). For this example, I'll use lvs 3 and 4.
  • find out which LVS server is active (see above). For this example, I'll assume lvs4.
  • log into the inactive host twice.
  • in one session, tail the pybal log looking for one (or more) of your backend servers. eg tail -f /var/log/pybal.log | grep ms-fe
    • you should see one line per server every 10 seconds or so. Look for "enabled/up/pooled". eg:
 2012-03-13 19:12:20.003332 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.006 s
  • in the other session, run puppet and verify your change exists on the local filesystem
  • get a list of all IP addresses served by this LVS server - you're going to check that they all exist after your change
    • run ip addr and save the output for later
  • restart pybal
    • pybal doesn't die correctly, so a restart involves:
    • /etc/init.d/pybal stop
    • pkill -9 pybal
    • /etc/init.d/pybal start
  • check that all the expected IP addresses exist
    • run 'ip addr' and compare against the list you collected before making your change
  • in the log you're tailing, you should see a few messages like:
 2012-03-13 19:21:26.015393 New enabled server ms-fe1.pmtpa.wmnet, weight 40
 2012-03-13 19:21:26.015611 New enabled server ms-fe2.pmtpa.wmnet, weight 40
 2012-03-13 19:21:26.015666 ['-a -t 10.2.1.27:80 -r ms-fe2.pmtpa.wmnet -w 40', '-a -t 10.2.1.27:80 -r ms-fe1.pmtpa.wmnet -w 40']
 2012-03-13 19:21:26.074253 [IdleConnection] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Connection established.
 2012-03-13 19:21:26.075302 [IdleConnection] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Connection established.
 2012-03-13 19:21:36.022517 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.007 s
 2012-03-13 19:21:36.023442 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.008 s
 2012-03-13 19:21:46.030482 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch successful, 0.008 s
  • Look for "Fetch successful". This is the line that means your change is successful. Seeing 'enabled/up/pooled' is insufficient.
    • example of a failed change (note that it still says enabled/up/pooled for a few lines - this does not mean that it's ok! look for the Fetch line):
 2012-03-13 19:27:23.626787 [IdleConnection] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Connection established.
 2012-03-13 19:27:23.632928 [IdleConnection] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Connection established.
 2012-03-13 19:27:33.555879 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/up/pooled): Fetch failed, 0.005 s
 2012-03-13 19:27:33.555917 Monitoring instance ProxyFetch reports servers ms-fe2.pmtpa.wmnet (enabled/up/pooled) down: 404 Not Found
 2012-03-13 19:27:33.556022 ['-d -t 10.2.1.27:80 -r ms-fe2.pmtpa.wmnet']
 2012-03-13 19:27:33.562458 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/up/pooled): Fetch failed, 0.012 s
 2012-03-13 19:27:33.562533 Monitoring instance ProxyFetch reports servers ms-fe1.pmtpa.wmnet (enabled/up/pooled) down: 404 Not Found
 2012-03-13 19:27:33.562589 Could not depool server ms-fe1.pmtpa.wmnet because of too many down!
 2012-03-13 19:27:43.561745 [ProxyFetch] ms-fe2.pmtpa.wmnet (enabled/partially up/not pooled): Fetch failed, 0.002 s
 2012-03-13 19:27:43.565608 [ProxyFetch] ms-fe1.pmtpa.wmnet (enabled/partially up/pooled): Fetch failed, 0.003 s
  • if your change was successful, repeat the procedure on the active host.
    • when you stop pybal on the active host, traffic will immediately fail over to the standby host.
    • when you restart pybal on the formerly active host, traffic will immediately fail back (the LVS pairs are configured with a default and a standby so traffic always flows to the default if it's up).

[edit] Add a new load balanced service

First choose whether it's a 'high-traffic' (aka public-facing) or 'low-traffic' (aka internal post-cache)

[edit] DNS changes

  • allocate an IP address per colo to serve your content
  • internal addresses should have names *.svc.$colo.wmnet:
    • pmtpa should be in the 10.2.1.0/24 range
    • eqiad should be in the 10.2.2.0/24 range
  • external addresses:
    • These need to be allocated from the (small!) public IP address pool, and may need specific configuration on the routers. Talk to the network admins first (Mark/Leslie).

[edit] Puppet Changes

  • in manifests/lvs.pp
    • add a stanza to $lvs_service_ips defining the IP addresses mimicking an existing entry
    • add a stanza to $lvs_services mimicking an existing entry
      • use your choice of high or low volume traffic here
  • in manifests/site.pp
    • add the IP address to $lvs_balancer_ips in the node definition for the lvs servers that will serve your service
    • add the IP address and an include for lvs::realserver to your node definition
  $lvs_realserver_ips = [ "10.2.1.xx" ]
  include lvs::realserver

[edit] noc changes

  • create a file in /home/w/conf/pybal/$colo/ for your service listing which hosts are available for the backend
  • see PyBal for more detail on how this part works

[edit] Deploy your changes

  • see instructions above (Deploy a change to an existing service)

[edit] LVS installation

LVS now uses Puppet and automatic BGP failover. Puppet arranges the service IP configuration, and installation of packages. To configure the service IPs that an LVS balancer should serve (both primary and backup!), set the $lvs_balancer_ips variable:

node /amslvs[1-4]\.esams\.wikimedia\.org/ {
        $cluster = "misc_esams"

        $lvs_balancer_ips = [ "91.198.174.2", "91.198.174.232", "91.198.174.233", "91.198.174.234" ]

        include base,
                ganglia,
                lvs::balancer
}

In this setup, all 4 hosts amslvs1-amslvs4 are configured to accept all service IPs, although in practice every service IP is only ever serviced by one out of two hosts due to the router configuration.

Puppet uses the (now misleadingly named) wikimedia-lvs-realserver package to bind these IPs to the loopback (!) interface. This is to make sure that a server answers on these IPs, but does not announce them via ARP - we'll use BGP for that.

[edit] LVS service configuration

In file lvs.pp the services themselves are configured, from which the PyBal configuration file /etc/pybal/pybal.conf is generated by Puppet.

Most configuration is in a large associative hash, $lvs_services. Each key in this hash is the name of one LVS service, and points to hash of PyBal configuration variables:

description 
Textual description of the LVS service.
class 
The class the LVS service belongs too; i.e. on which LVS balancers it is active (see below).
ip 
A hash of service IP address for the service. All IP addresses are aliases, and are translated to separate LVS services in PyBal.conf, but with identical configuration.

The other configuration variables are described in the PyBal article.

Global PyBal configuration options can be specified in the $pybal hash.

[edit] Classes

To determine which LVS services are active on which hosts, the $lvs_class_hosts determines for each class, which hosts should have the services for that class. This is used by the pybal.conf template to generate the LVS services. The following classes are used, to distribute traffic over the LVS balancer hosts:

  • high-traffic1 (text, bits)
  • high-traffic2 (text, upload)
  • https (HTTPS services corresponding to the 'high-traffic' HTTP services; should be active on all hosts that carry either class)
  • specials (special LVS services, especially those that do not have BGP enabled)
  • low-traffic (internal load balancing, e.g. from the Squids to the Apaches)

[edit] BGP failover and load sharing

Previously, the LVS balancer that had a certain service IP bound to its eth0 interface was active for that IP. To do failovers, the IP had to be moved manually.

In the new setup, multiple servers announce the service IP(s) via BGP to the router(s), which then pick which server(s) to use based on BGP routing policy.

[edit] PyBal BGP configuration

In the global section, the following BGP related settings typically exist:

bgp = yes

Enables bgp globally, but can be overridden per service.

bgp-local-asn =  64600

The ASN to use while communicating to the routers. All prefixes will get this ASN as AS path.

bgp-peer-address = 91.198.174.247

The IP of the router this PyBal instance speaks BGP to.

#bgp-as-path = 64600 64601

An optional modified AS path. Can be used e.g. to make the AS path longer and thus less attractive (on a backup balancer).

[edit] Example BGP configuration for Foundry

router bgp
 neighbor 91.198.174.109 remote-as 64600
 neighbor 91.198.174.109 description "PyBal on amslvs1"
 neighbor 91.198.174.109 timers  keep-alive 10  hold-time 30
 neighbor 91.198.174.109 update-source loopback 1
 neighbor 91.198.174.110 remote-as 64600
 neighbor 91.198.174.110 description "PyBal on amslvs2"
 neighbor 91.198.174.110 timers  keep-alive 10  hold-time 30
 neighbor 91.198.174.110 update-source loopback 1

 neighbor 91.198.174.244 description "iBGP to csw2-esams"
 neighbor 91.198.174.244 timers  keep-alive 10  hold-time 30
 neighbor 91.198.174.244 update-source loopback 1


 neighbor 91.198.174.109 prefix-list LVS in
 neighbor 91.198.174.109 prefix-list none out
 neighbor 91.198.174.110 maximum-prefix 10 teardown
 neighbor 91.198.174.110 prefix-list LVS in                       
 neighbor 91.198.174.110 prefix-list none out

 neighbor 91.198.174.244 maximum-prefix 10 teardown
 neighbor 91.198.174.244 prefix-list LVS in
 neighbor 91.198.174.244 prefix-list LVS out
 neighbor 91.198.174.244 unsuppress-map LVS-IBGP-EXCHANGE
!

ip prefix-list  LVS seq 5 permit 91.198.174.0/25 ge 32 
ip prefix-list  LVS seq 10 permit 91.198.174.232/30 ge 32 


route-map  LVS-IBGP-EXCHANGE permit  10 
 match ip address prefix-list LVS
route-map  LVS-IBGP-EXCHANGE deny  100

[edit] Example BGP configuration for Juniper (csw2-esams)

root@csw2-esams> show configuration protocols bgp    
group PyBal {
    type external;
    multihop {
        ttl 1;
    }
    local-address 91.198.174.244;
    hold-time 30;
    import LVS_import;
    family inet {
        unicast {
            prefix-limit {
                maximum 10;
                teardown;
            }
        }
    }
    export NONE;
    peer-as 64600;
    neighbor 91.198.174.111;
    neighbor 91.198.174.112;
}
group iBGP {
    type internal;
    peer-as 43821;
    neighbor 91.198.174.247 {
        import LVS_exchange;
        export LVS_exchange;
    }
}

root@csw2-esams> show configuration policy-options 
prefix-list LVS {
    91.198.174.0/25;
    91.198.174.232/30;
}
policy-statement LVS_exchange {
    term 1 {
        from {
            prefix-list-filter LVS longer;
        }
        then accept;
    }
    from protocol bgp;
}
policy-statement LVS_import {
    term 1 {
        from {
            protocol bgp;
            prefix-list-filter LVS longer;
        }
        then {
            metric add 10;
            accept;
        }
    }
}

The LVS_import policy adds metric 10 to the "routes" (service IPs) received from the secondary (backup) LVS balancers. This means that the router will regard them as less preferred.

At esams, Foundry router csw1-esams and JUNOS router csw2-esams exchange the service IPs over iBGP.

[edit] SSH checking

As the Apache cluster is often suffering from broken disks which break SSH but keep Apache up, I have implemented a RunCommand monitor in PyBal which can periodically run an arbitrary command, and check the server's health by the return code. If the command does not return within a certain timeout, the server is marked down as well.

The RunCommand configuration is in /etc/pybal/pybal.conf:

runcommand.command = /bin/sh
runcommand.arguments = [ '/etc/pybal/runcommand/check-apache', server.host ]
runcommand.interval = 60
runcommand.timeout = 10
runcommand.command 
The path to the command which is being run. Since we are using a shell script and PyBal does not invoke a shell by itself, we have to do that explicitly.
runcommand.arguments 
A (Python) list of command arguments. This list can refer to the monitor's server object, as shown here.
runcommand.interval 
How often to run the check (seconds).
runcommand.timeout 
The command timeout; after this amount of seconds the entire process group of the command will be KILLed, and the server is marked down.

Currently we're using the following RunCommand script, in /etc/pybal/runcommand/check-apache:

#!/bin/sh

set -e

HOST=$1
SSH_USER=pybal-check
SSH_OPTIONS="-o PasswordAuthentication=no -o StrictHostKeyChecking=no -o ConnectTimeout=8"

# Open an SSH connection to the real-server. The command is overridden by the authorized_keys file.
ssh -i /root/.ssh/pybal-check $SSH_OPTIONS $SSH_USER@$HOST true

exit 0

The limited ssh accounts on the application servers are managed by the wikimedia-task-appserver package.

[edit] Monitoring

Nagios monitoring of LVS services is managed by Puppet as well, at the bottom of the lvs.pp file. For example:

monitor_service_lvs_http { "wikimedia-lb.pmtpa.wikimedia.org":
    ip_address => "208.80.152.200",
    check_command => "check_http_lvs!meta.wikimedia.org!/wiki/Main_Page"
}
monitor_service_lvs_https { "wikimedia-lb.pmtpa.wikimedia.org":
    ip_address => "208.80.152.200",
    check_command => "check_https_url!meta.wikimedia.org!/wiki/Main_Page"
}

[edit] Diagnosing problems

Run ipvsadm -l on the director. Healthy output looks like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  upload.pmtpa.wikimedia.org:h wlc
  -> sq10.pmtpa.wmnet:http        Route   10     5202       5295
  -> sq1.pmtpa.wmnet:http         Route   10     8183       12213
  -> sq4.pmtpa.wmnet:http         Route   10     7824       13360
  -> sq5.pmtpa.wmnet:http         Route   10     7843       12936
  -> sq6.pmtpa.wmnet:http         Route   10     7930       12769
  -> sq8.pmtpa.wmnet:http         Route   10     7955       11010
  -> sq2.pmtpa.wmnet:http         Route   10     7987       13190
  -> sq7.pmtpa.wmnet:http         Route   10     8003       7953

All the servers are getting a decent amount of traffic, there's just normal variation.

If a realserver is refusing connections or doesn't have the VIP configured, it will look like this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  upload.pmtpa.wikimedia.org:h wlc
  -> sq10.pmtpa.wmnet:http        Route   10     2          151577
  -> sq1.pmtpa.wmnet:http         Route   10     2497       1014
  -> sq4.pmtpa.wmnet:http         Route   10     2459       1047
  -> sq5.pmtpa.wmnet:http         Route   10     2389       1048
  -> sq6.pmtpa.wmnet:http         Route   10     2429       1123
  -> sq8.pmtpa.wmnet:http         Route   10     2416       1024
  -> sq2.pmtpa.wmnet:http         Route   10     2389       970
  -> sq7.pmtpa.wmnet:http         Route   10     2457       1008

Active connections for the problem server are depressed, inactive connections normal or above normal. This problem must be fixed immediately, because in wlc mode, LVS load balances based on the ActiveConn column, meaning that servers that are down get all the traffic.

[edit] Incorrectly bound interfaces

Don't ever bind IP addresses directly to lo in /etc/network/interfaces. If you do, stuff breaks. (This applies not just to LVS servers but any real server as well. Anything with the wikimedia-lvs-realserver package will break if you bind addresses manually.)

When it's broken, it looks like this. Notice that all the balanced IP addresses are tagged lo:LVS except 10.2.1.13. 13 is broken and causes the ifup script that reloads the IPs to be broken.

    inet 127.0.0.1/8 scope host lo
    inet 10.2.1.13/32 scope global lo
    inet 10.2.1.1/32 scope global lo:LVS
    inet 10.2.1.11/32 scope global lo:LVS
    inet 10.2.1.12/32 scope global lo:LVS
    inet6 ::1/128 scope host 

The solution here is to delete the broken interface (ip addr del 10.2.1.13/32 dev lo), then run dpkg-reconfigure wikimedia-lvs-realserver. This triggers the scripts that will re-add all the IP addresses.

Happy ip addr output looks like this:

root@lvs4:/etc/network# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet 10.2.1.1/32 scope global lo:LVS
    inet 10.2.1.11/32 scope global lo:LVS
    inet 10.2.1.12/32 scope global lo:LVS
    inet 10.2.1.13/32 scope global lo:LVS
    inet 10.2.1.21/32 scope global lo:LVS
    inet 10.2.1.22/32 scope global lo:LVS
    inet 10.2.1.27/32 scope global lo:LVS
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    etc...
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox