BGP/old setup

From Wikitech
< BGP(Difference between revisions)
Jump to: navigation, search
 
(18 intermediate revisions by 5 users not shown)
Line 1: Line 1:
To do automatic NAT failover, we use the BGP routing protocol, as implemented by [http://www.quagga.net Quagga].  The two routers, Zwinger and Albert, run a BGP daemon configured to advertise the default route, 0.0.0.0/0.  All internal machines (currently dalembert and goeje) run bgpd and peer with both zwinger and albert, getting the default route from either one (zwinger is weighted lower, so by default it should use that).  If either Zwinger or Albert goes down, the BGP connection is dropped and Quagga automatically moves the default route to the remaining host, if needed.
+
{{Old-doc}}
 +
The '''pmtpa''' cluster has ''layer 3 failover'' using two separate core routers, and 2 BGP links to upstream.
  
Quagga is divided into two parts, zebra (the main daemon) and bgpd (the BGP implementation). it should be started by:
+
In order to overcome the disaster of a uplink failure, loss of a router (ours or upstream) and to increase our available bandwidth from 1 to 2 Gbit/s, we implemented failover and connection load sharing using the [http://en.wikipedia.org/wiki/BGP BGP] protocol. Both our routers [[csw1-pmtpa]] and [[csw4-pmtpa]] have separate fiber connections to two PowerMedium routers.
  
/opt/quagga/bin/zebra -d
+
This setup makes the network redundant and unaffected by the loss of either of the two PowerMedium routers. To make sure that the internal network is also unaffected, all downstream switches and hosts should be redundantly connected to both multilayer switches. [[#HSRP|HSRP]] is used to make internal routing redundant, by offering a virtual gateway failover IP to all hosts, that in reality is served by either of the two multilayer switches.
/opt/quagga/bin/bgpd -d
+
  
on hosts acting as routers, also do this:
+
== Diagram ==
 +
[[Image:Wikimedia-core.png]]
  
ifconfig eth0:2 207.142.131.216 netmask 255.255.255.255
+
== BGP ==
iptables -t nat -I POSTROUTING -o eth0 -s 10.0.0.0/8 -j SNAT --to 207.142.131.216
+
As we don't have our own AS number, we use one out of the range of reserved AS numbers for private use, '''AS 64600'''.
  
(Replace .216 with a spare IP)
+
''Synchronisation'' with IGPs (Interior Gateway Protocol) is turned off, as we don't have one. The router-id (used for identification within the BGP protocol) is set to <tt>207.142.131.240</tt> for [[csw1-pmtpa]], and <tt>207.142.131.244</tt> for [[csw4-pmtpa]]. Logging of events concerning BGP peers/neighbors is turned on. Because we want failover to happen quickly in cases of link failures, the ''keepalive time'' is reduced to 5 seconds, and the ''BGP hold time'' to 15.
  
To add a new internal host to a router:
+
csw1-pmtpa:
 +
router bgp 64600
 +
  no synchronization
 +
 +
  bgp router-id 207.142.131.240
 +
  bgp log-neighbor-changes
 +
  timers bgp 5 15
  
[root@zwinger root]# telnet zwinger bgpd
+
csw4-pmtpa:
  Trying 207.142.131.234...
+
  router bgp 64600
Connected to zwinger.
+
  no synchronization
  Escape character is '^]'.
+
   
&nbsp;
+
  bgp router-id 207.142.131.244
Hello, this is quagga (version 0.96.5).
+
  bgp log-neighbor-changes
Copyright 1996-2002 Kunihiro Ishiguro.
+
  timers bgp 5 15
&nbsp;
+
&nbsp;
+
User Access Verification
+
&nbsp;
+
Password:
+
zwinger.wikimedia.org> en
+
Password:
+
zwinger.wikimedia.org# conf t
+
zwinger.wikimedia.org(config)# router bgp 64512
+
zwinger.wikimedia.org(config-router)# neighbor 10.0.0.X remote-as 64512
+
zwinger.wikimedia.org(config-router)# ex
+
zwinger.wikimedia.org(config)# ex
+
zwinger.wikimedia.org# cop run sta
+
Configuration saved to /opt/quagga/etc/bgpd.conf
+
zwinger.wikimedia.org#
+
  
Do this on both Albert and ZwingerDo the same on the new internal host, but add Zwinger and Albert as neighbors (using their internal IPs).
+
For both routers, the network prefixes that we shall announce are:
 +
router bgp 64600
 +
  ! Network prefixes to be announced
 +
  ! Wikimedia
 +
  network 207.142.131.192 mask 255.255.255.192
 +
  ! Wikia
 +
  network 84.40.25.224 mask 255.255.255.224
 +
 +
 
 +
=== iBGP ===
 +
iBGP (internal BGP, BGP sessions within the same AS) is setup between the [[csw1-pmtpa]] and [[csw4-pmtpa]] so they can share eachother's routes. A dedicated 1 Gbit/s link is setup for this between the two multilayer switches with both ports in ''routed mode''. The IP subnet used is <tt>10.10.0.4/30</tt>.
 +
 
 +
csw1-pmtpa:
 +
interface GigabitEthernet0/36
 +
  description iBGP link to csw4-pmtpa
 +
  no switchport
 +
  ip address 10.10.0.5 255.255.255.252
 +
 
 +
csw4-pmtpa:
 +
interface GigabitEthernet0/47
 +
  description iBGP link to csw1-pmtpa
 +
  no switchport
 +
  ip address 10.10.0.6 255.255.255.252
 +
 
 +
On both routers, neighbor statements are added for an iBGP session between eachother:
 +
 
 +
csw1-pmtpa:
 +
router bgp 64600
 +
  neighbor 10.10.0.6 remote-as 64600
 +
  neighbor 10.10.0.6 description iBGP session to csw4-pmtpa
 +
 +
  neighbor 10.10.0.6 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 10.10.0.6 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 10.10.0.6 maximum-prefix 10
 +
 
 +
  ! Originate a default route if we received one through BGP
 +
  neighbor 10.10.0.6 default-originate route-map bgp-pm-default
 +
 +
  ! Only distribute and accept a default route
 +
  neighbor 10.10.0.6 prefix-list bgp-default in
 +
  neighbor 10.10.0.6 prefix-list bgp-default out
 +
   
 +
  ! Replace the next-hop address by our address to iBGP peers
 +
  neighbor 10.10.0.6 next-hop-self
 +
 
 +
csw4-pmtpa:
 +
router bgp 64600
 +
  neighbor 10.10.0.5 remote-as 64600
 +
  neighbor 10.10.0.5 description iBGP session to csw1-pmtpa
 +
 +
  neighbor 10.10.0.5 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 10.10.0.5 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 10.10.0.5 maximum-prefix 10
 +
 +
  ! Originate a default route if we received one through BGP
 +
  neighbor 10.10.0.5 default-originate route-map bgp-pm-default
 +
 +
  ! Only distribute and accept a default route
 +
  neighbor 10.10.0.5 prefix-list bgp-default in
 +
  neighbor 10.10.0.5 prefix-list bgp-default out
 +
 +
  ! Replace the next-hop address by our address to iBGP peers
 +
  neighbor 10.10.0.5 next-hop-self
 +
 
 +
==== default-originate ====
 +
 
 +
A default route, <tt>0.0.0.0/0</tt> is not announced over BGP by default. Therefore, an explicit <tt>neighbor ... default-originate</tt> is given so a default route is announced if the router has one in its routing table. However, it turns out that any default route gives rise to that behaviour, even if it's a staticly entered route, or one received from the same peer it is being announced to - giving routing loops. Therefore, we limit originating a default route only if the route was received over eBGP from AS 30217, PowerMedium, using a ''route-map'' <tt>bgp-pm-default</tt>:
 +
 
 +
! Route map match list, matching AS ^30217$ (PowerMedium originated)
 +
ip as-path access-list 1 permit ^30217$
 +
 
 +
! Route map as bgp originate-default conditional. Only originate
 +
! a default route if we got it from AS 30217 (PowerMedium)
 +
route-map bgp-pm-default permit 10
 +
  match as-path 1
 +
 
 +
==== prefix filters ====
 +
To make sure we don't actually announce or receive prefixes we don't expect, ''filters'' are used. iBGP in this setup is only used to distribute a default route, so the following prefix-list is used:
 +
 
 +
ip prefix-list bgp-default description BGP filter that allows just a default route
 +
ip prefix-list bgp-default seq 10 deny 0.0.0.0/0 ge 1
 +
ip prefix-list bgp-default seq 20 permit 0.0.0.0/0
 +
 
 +
==== next-hop-self ====
 +
Because a route announced over iBGP by default has a ''next hop'' field set to the IP address of the eBGP peer by which the route was announced, and the receiving router may not necessarily have a route to that address, the ''next hop'' field is set to the iBGP router's own address. Any iBGP session peer can reach that address by definition.
 +
 
 +
=== eBGP ===
 +
To actually make failover work, each router is set up for an eBGP session to the PowerMedium router it's directly connected to.
 +
 
 +
csw1-pmtpa:
 +
router bgp 64600
 +
  neighbor 64.156.25.241 remote-as 30217
 +
  neighbor 64.156.25.241 description BGP session to PowerMedium
 +
 +
  ! Use soft reconfiguration for this peer, less disruptive updates
 +
  neighbor 64.156.25.241 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 64.156.25.241 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 64.156.25.241 maximum-prefix 10
 +
 +
  ! Allow only routes that originate here to be announced
 +
  neighbor 64.156.25.241 filter-list 2 out
 +
 +
  ! Also filter on IP addresses for outgoing upgrades
 +
  neighbor 64.156.25.241 prefix-list bgp-outfilter out
 +
 +
  ! Allow only incoming prefixes in prefix-list bgp-infilter
 +
  neighbor 64.156.25.241 prefix-list bgp-default in
 +
 
 +
csw4-pmtpa:
 +
router bgp 64600
 +
  neighbor 84.40.25.221 remote-as 30217
 +
  neighbor 84.40.25.221 description BGP session to PowerMedium
 +
 +
  ! Use soft reconfiguration for this peer, less disruptive updates
 +
  neighbor 84.40.25.221 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 84.40.25.221 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 84.40.25.221 maximum-prefix 10
 +
 +
  ! Allow only routes that originate here to be announced
 +
  neighbor 84.40.25.221 filter-list 2 out
 +
 +
  ! Also filter on IP prefixes for outgoing upgrades
 +
  neighbor 84.40.25.221 prefix-list bgp-outfilter out
 +
 +
  ! Allow only an incoming default route prefix
 +
  neighbor 84.40.25.221 prefix-list bgp-default in
 +
 
 +
==== filter-list ====
 +
Announcement of prefixes are limited to prefixes having an empty ''AS path'', meaning that the prefix was not received from another AS, and thus must have originated here. This makes sure that we don't announce unintended routes and/or offer transit to peers:
 +
 
 +
! BGP filter list. Send only prefixes with an empty AS
 +
! list to which our AS will be added, so routes have to
 +
! originate here
 +
ip as-path access-list 2 permit ^$
 +
 
 +
==== distribute-list ====
 +
For incoming announcements, we just expect a default-route prefix for which the filter <tt>bgp-default</tt> can be used, as described above. Outgoing we want to announce our network prefixes:
 +
 
 +
ip prefix-list bgp-outfilter description BGP outbound filter
 +
ip prefix-list bgp-outfilter seq 10 permit 207.142.131.192/26
 +
ip prefix-list bgp-outfilter seq 20 permit 84.40.25.224/27
 +
 
 +
=== Load sharing: multihop eBGP ===
 +
With the above setup, ''failover'' works fine, but each of the routers uses just its own outbound default route for traffic. Since only one of the two routers is seeing any substantial amount of traffic, this is not a satisfying solution if we want to balance both links.
 +
 
 +
BGP supports a feature ''BGP load-sharing'' or ''multipath''. If allowed, it'll use multiple announcements of the same route (with the same characteristics) to send traffic to. This can be set up using the keyword ''maximum-paths'':
 +
router bgp 64600
 +
  maximum-paths 2
 +
 
 +
However, in the above setup, this doesn't work. Every router indeed sees two copies of a default route, one from its direct eBGP peer, and one over iBGP from the other router. But these routes are not exactly equal: one is external, and one is internal. Cisco IOS requires them all to be either external or all internal for multipath to work.
 +
 
 +
To solve this problem, both routers were set up for eBGP sessions to the ''other'' PowerMedium routers, to which they didn't previously have sessions to. Because they don't have direct connections to them, these sessions have to be forwarded over an extra hop however, utilizing a BGP feature known as ''multihop''. The maximum amount of hops between two eBGP peers (default: 1) needs to be configured using the ''ebgp-multihop'' neighbor parameter.
 +
 
 +
In addition, both routers need to have routes to the <tt>/30</tt> interconnect subnets of the ''other'' router pair to connect to them. For simplicity and stability, this is achieved using static routes, over the dedicated link for iBGP:
 +
 
 +
csw1-pmtpa:
 +
ip route 84.40.25.220 255.255.255.252 10.10.0.6
 +
! Backup route
 +
ip route 84.40.25.220 255.255.255.252 207.142.131.244 250
 +
! Make sure we don't route this traffic externally if the internal links are down
 +
ip route 84.40.25.220 255.255.255.252 Null0 251
 +
 
 +
csw4-pmtpa:
 +
ip route 64.156.25.240 255.255.255.252 10.10.0.5
 +
! Backup route
 +
ip route 64.156.25.240 255.255.255.252 207.142.131.240 250
 +
! Make sure we don't route this traffic externally if the internal links are down
 +
ip route 64.156.25.240 255.255.255.252 Null0 251
 +
 
 +
The routers need to use the correct ''source address'' for their BGP sessions. The default is to use the one of the first outgoing interface, which is a problematic 10.10.* in this case. It can be statically selected with <tt>neighbor ... update-source</tt>.
 +
 
 +
The multihop eBGP sessions on each router are almost equal to the corresponding single hop ones. csw1-pmtpa:
 +
router bgp 64600
 +
  neighbor 84.40.25.221 remote-as 30217
 +
  neighbor 84.40.25.221 description Multihop BGP session to PowerMedium
 +
 +
  ! For multihop
 +
  neighbor 84.40.25.221 update-source GigabitEthernet0/50
 +
  neighbor 84.40.25.221 ebgp-multihop 2
 +
 +
  ! Use soft reconfiguration for this peer, less disruptive updates
 +
  neighbor 84.40.25.221 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 84.40.25.221 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 84.40.25.221 maximum-prefix 10
 +
 +
  ! Allow only routes that originate here to be announced
 +
  neighbor 84.40.25.221 filter-list 2 out
 +
 +
  ! Also filter on IP prefixes for outgoing upgrades
 +
  neighbor 84.40.25.221 prefix-list bgp-outfilter out
 +
 +
  ! Allow only an incoming default route prefix
 +
  neighbor 84.40.25.221 prefix-list bgp-default in
 +
 
 +
csw4-pmtpa:
 +
router bgp 64600
 +
  neighbor 64.156.25.241 remote-as 30217
 +
  neighbor 64.156.25.241 description Multihop BGP session to PowerMedium
 +
 +
  ! For multihop
 +
  neighbor 64.156.25.241 update-source GigabitEthernet0/52
 +
  neighbor 64.156.25.241 ebgp-multihop 2
 +
 +
  ! Use soft reconfiguration for this peer, less disruptive updates
 +
  neighbor 64.156.25.241 soft-reconfiguration inbound
 +
 +
  ! Fast failover
 +
  neighbor 64.156.25.241 timers 10 30
 +
 +
  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
 +
  neighbor 64.156.25.241 maximum-prefix 10
 +
 +
  ! Allow only routes that originate here to be announced
 +
  neighbor 64.156.25.241 filter-list 2 out
 +
 +
  ! Also filter on IP addresses for outgoing upgrades
 +
  neighbor 64.156.25.241 prefix-list bgp-outfilter out
 +
 +
  ! Allow only incoming prefixes in prefix-list bgp-infilter
 +
  neighbor 64.156.25.241 prefix-list bgp-default in
 +
 
 +
Once this is setup, both routers have two equal default routes each, and can employ multipath:
 +
 
 +
<pre>
 +
csw1-pmtpa#sh ip bgp 0.0.0.0
 +
BGP routing table entry for 0.0.0.0/0, version 26
 +
Paths: (3 available, best #1, table Default-IP-Routing-Table)
 +
Multipath: eBGP
 +
  Advertised to update-groups:
 +
    1
 +
  30217, (received & used)
 +
    84.40.25.221 from 84.40.25.221 (84.40.24.249)
 +
      Origin IGP, metric 0, localpref 100, valid, external, multipath, best
 +
  30217, (received & used)
 +
    10.10.0.6 from 10.10.0.6 (207.142.131.244)
 +
      Origin IGP, metric 0, localpref 100, valid, internal
 +
  30217, (received & used)
 +
    64.156.25.241 from 64.156.25.241 (64.156.25.241)
 +
      Origin IGP, localpref 100, valid, external, multipath
 +
</pre>
 +
 
 +
csw1-pmtpa#sh ip route
 +
 +
    ''[snip]''
 +
 +
B*  0.0.0.0/0 [20/0] via 84.40.25.221, 01:36:53
 +
                [20/0] via 64.156.25.241, 23:30:23
 +
 
 +
== HSRP ==
 +
In order to have ''first hop failover'', i.e. failover of the default gateway of all hosts in the network, [http://en.wikipedia.org/wiki/HSRP HSRP] has been implemented on VLAN 1 and 2 on both routers, [[csw1-pmtpa]] and [[csw4-pmtpa]]. Both routers share a "virtual IP", the gateway IP that has been configured on all hosts. One of both routers is active for the IP, the other is in standby, taking over within seconds when the active router fails. Besides the virtual IP, each router has its own unique IP in the respective subnet.
 +
 
 +
VLAN 1 makes use of HSRP group 1, VLAN 2 uses HSRP group 2. In both cases [[csw1-pmtpa]] has the highest priority and is therefore the default router, but ''preemption'' is disabled, so the default router won't force itself to be the active router once it comes back up.
 +
 
 +
=== Configuration [[csw1-pmtpa]] ===
 +
The relevant configuration bits are:
 +
interface Vlan1
 +
  description Public VLAN / interface
 +
  ip address 207.142.131.240 255.255.255.192
 +
  standby 1 ip 207.142.131.193
 +
  standby 1 priority 150
 +
end
 +
 
 +
interface Vlan2
 +
  description Private VLAN: Apache
 +
  ip address 10.0.0.201 255.255.0.0
 +
  standby 2 ip 10.0.0.200
 +
  standby 2 priority 150
 +
end
 +
 
 +
=== Configuration [[csw4-pmtpa]] ===
 +
The relevant configuration bits are:
 +
interface Vlan1
 +
  description Public VLAN
 +
  ip address 207.142.131.244 255.255.255.192
 +
  standby 1 ip 207.142.131.193
 +
end
 +
 
 +
interface Vlan2
 +
  description Private VLAN
 +
  ip address 10.0.0.202 255.255.0.0
 +
  standby 2 ip 10.0.0.200
 +
end
 +
 
 +
== External links ==
 +
* [http://www.cisco.com/en/US/products/hw/switches/ps5528/products_configuration_guide_chapter09186a00803a9a8e.html#wp1226539 Configuring BGP on Cisco 3560]
 +
* [http://www.cisco.com/en/US/products/hw/switches/ps5528/products_configuration_guide_chapter09186a00803a9a27.html#wp1061629 Configuring HSRP on Cisco 3560]
 +
 
 +
[[Category:Network]]

Latest revision as of 22:59, 1 November 2010

This page page may be outdated and could contain incorrect details. Please update it if you can.

The pmtpa cluster has layer 3 failover using two separate core routers, and 2 BGP links to upstream.

In order to overcome the disaster of a uplink failure, loss of a router (ours or upstream) and to increase our available bandwidth from 1 to 2 Gbit/s, we implemented failover and connection load sharing using the BGP protocol. Both our routers csw1-pmtpa and csw4-pmtpa have separate fiber connections to two PowerMedium routers.

This setup makes the network redundant and unaffected by the loss of either of the two PowerMedium routers. To make sure that the internal network is also unaffected, all downstream switches and hosts should be redundantly connected to both multilayer switches. HSRP is used to make internal routing redundant, by offering a virtual gateway failover IP to all hosts, that in reality is served by either of the two multilayer switches.

Contents

[edit] Diagram

Wikimedia-core.png

[edit] BGP

As we don't have our own AS number, we use one out of the range of reserved AS numbers for private use, AS 64600.

Synchronisation with IGPs (Interior Gateway Protocol) is turned off, as we don't have one. The router-id (used for identification within the BGP protocol) is set to 207.142.131.240 for csw1-pmtpa, and 207.142.131.244 for csw4-pmtpa. Logging of events concerning BGP peers/neighbors is turned on. Because we want failover to happen quickly in cases of link failures, the keepalive time is reduced to 5 seconds, and the BGP hold time to 15.

csw1-pmtpa:

router bgp 64600
  no synchronization

  bgp router-id 207.142.131.240
  bgp log-neighbor-changes
  timers bgp 5 15

csw4-pmtpa:

router bgp 64600
  no synchronization

  bgp router-id 207.142.131.244
  bgp log-neighbor-changes
  timers bgp 5 15

For both routers, the network prefixes that we shall announce are:

router bgp 64600
  ! Network prefixes to be announced
  ! Wikimedia
  network 207.142.131.192 mask 255.255.255.192
  ! Wikia
  network 84.40.25.224 mask 255.255.255.224

[edit] iBGP

iBGP (internal BGP, BGP sessions within the same AS) is setup between the csw1-pmtpa and csw4-pmtpa so they can share eachother's routes. A dedicated 1 Gbit/s link is setup for this between the two multilayer switches with both ports in routed mode. The IP subnet used is 10.10.0.4/30.

csw1-pmtpa:

interface GigabitEthernet0/36
  description iBGP link to csw4-pmtpa
  no switchport
  ip address 10.10.0.5 255.255.255.252

csw4-pmtpa:

interface GigabitEthernet0/47
  description iBGP link to csw1-pmtpa
  no switchport
  ip address 10.10.0.6 255.255.255.252

On both routers, neighbor statements are added for an iBGP session between eachother:

csw1-pmtpa:

router bgp 64600
  neighbor 10.10.0.6 remote-as 64600
  neighbor 10.10.0.6 description iBGP session to csw4-pmtpa

  neighbor 10.10.0.6 soft-reconfiguration inbound

  ! Fast failover
  neighbor 10.10.0.6 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 10.10.0.6 maximum-prefix 10
 
  ! Originate a default route if we received one through BGP
  neighbor 10.10.0.6 default-originate route-map bgp-pm-default

  ! Only distribute and accept a default route
  neighbor 10.10.0.6 prefix-list bgp-default in
  neighbor 10.10.0.6 prefix-list bgp-default out

  ! Replace the next-hop address by our address to iBGP peers
  neighbor 10.10.0.6 next-hop-self

csw4-pmtpa:

router bgp 64600
  neighbor 10.10.0.5 remote-as 64600
  neighbor 10.10.0.5 description iBGP session to csw1-pmtpa

  neighbor 10.10.0.5 soft-reconfiguration inbound

  ! Fast failover
  neighbor 10.10.0.5 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 10.10.0.5 maximum-prefix 10

  ! Originate a default route if we received one through BGP
  neighbor 10.10.0.5 default-originate route-map bgp-pm-default

  ! Only distribute and accept a default route
  neighbor 10.10.0.5 prefix-list bgp-default in
  neighbor 10.10.0.5 prefix-list bgp-default out

  ! Replace the next-hop address by our address to iBGP peers
  neighbor 10.10.0.5 next-hop-self

[edit] default-originate

A default route, 0.0.0.0/0 is not announced over BGP by default. Therefore, an explicit neighbor ... default-originate is given so a default route is announced if the router has one in its routing table. However, it turns out that any default route gives rise to that behaviour, even if it's a staticly entered route, or one received from the same peer it is being announced to - giving routing loops. Therefore, we limit originating a default route only if the route was received over eBGP from AS 30217, PowerMedium, using a route-map bgp-pm-default:

! Route map match list, matching AS ^30217$ (PowerMedium originated)
ip as-path access-list 1 permit ^30217$
! Route map as bgp originate-default conditional. Only originate
! a default route if we got it from AS 30217 (PowerMedium)
route-map bgp-pm-default permit 10
  match as-path 1

[edit] prefix filters

To make sure we don't actually announce or receive prefixes we don't expect, filters are used. iBGP in this setup is only used to distribute a default route, so the following prefix-list is used:

ip prefix-list bgp-default description BGP filter that allows just a default route
ip prefix-list bgp-default seq 10 deny 0.0.0.0/0 ge 1
ip prefix-list bgp-default seq 20 permit 0.0.0.0/0

[edit] next-hop-self

Because a route announced over iBGP by default has a next hop field set to the IP address of the eBGP peer by which the route was announced, and the receiving router may not necessarily have a route to that address, the next hop field is set to the iBGP router's own address. Any iBGP session peer can reach that address by definition.

[edit] eBGP

To actually make failover work, each router is set up for an eBGP session to the PowerMedium router it's directly connected to.

csw1-pmtpa:

router bgp 64600
  neighbor 64.156.25.241 remote-as 30217
  neighbor 64.156.25.241 description BGP session to PowerMedium

  ! Use soft reconfiguration for this peer, less disruptive updates
  neighbor 64.156.25.241 soft-reconfiguration inbound

  ! Fast failover
  neighbor 64.156.25.241 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 64.156.25.241 maximum-prefix 10

  ! Allow only routes that originate here to be announced
  neighbor 64.156.25.241 filter-list 2 out

  ! Also filter on IP addresses for outgoing upgrades
  neighbor 64.156.25.241 prefix-list bgp-outfilter out

  ! Allow only incoming prefixes in prefix-list bgp-infilter
  neighbor 64.156.25.241 prefix-list bgp-default in

csw4-pmtpa:

router bgp 64600
  neighbor 84.40.25.221 remote-as 30217
  neighbor 84.40.25.221 description BGP session to PowerMedium

  ! Use soft reconfiguration for this peer, less disruptive updates
  neighbor 84.40.25.221 soft-reconfiguration inbound

  ! Fast failover
  neighbor 84.40.25.221 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 84.40.25.221 maximum-prefix 10

  ! Allow only routes that originate here to be announced
  neighbor 84.40.25.221 filter-list 2 out

  ! Also filter on IP prefixes for outgoing upgrades
  neighbor 84.40.25.221 prefix-list bgp-outfilter out

  ! Allow only an incoming default route prefix
  neighbor 84.40.25.221 prefix-list bgp-default in

[edit] filter-list

Announcement of prefixes are limited to prefixes having an empty AS path, meaning that the prefix was not received from another AS, and thus must have originated here. This makes sure that we don't announce unintended routes and/or offer transit to peers:

! BGP filter list. Send only prefixes with an empty AS
! list to which our AS will be added, so routes have to
! originate here
ip as-path access-list 2 permit ^$

[edit] distribute-list

For incoming announcements, we just expect a default-route prefix for which the filter bgp-default can be used, as described above. Outgoing we want to announce our network prefixes:

ip prefix-list bgp-outfilter description BGP outbound filter
ip prefix-list bgp-outfilter seq 10 permit 207.142.131.192/26
ip prefix-list bgp-outfilter seq 20 permit 84.40.25.224/27

[edit] Load sharing: multihop eBGP

With the above setup, failover works fine, but each of the routers uses just its own outbound default route for traffic. Since only one of the two routers is seeing any substantial amount of traffic, this is not a satisfying solution if we want to balance both links.

BGP supports a feature BGP load-sharing or multipath. If allowed, it'll use multiple announcements of the same route (with the same characteristics) to send traffic to. This can be set up using the keyword maximum-paths:

router bgp 64600
  maximum-paths 2

However, in the above setup, this doesn't work. Every router indeed sees two copies of a default route, one from its direct eBGP peer, and one over iBGP from the other router. But these routes are not exactly equal: one is external, and one is internal. Cisco IOS requires them all to be either external or all internal for multipath to work.

To solve this problem, both routers were set up for eBGP sessions to the other PowerMedium routers, to which they didn't previously have sessions to. Because they don't have direct connections to them, these sessions have to be forwarded over an extra hop however, utilizing a BGP feature known as multihop. The maximum amount of hops between two eBGP peers (default: 1) needs to be configured using the ebgp-multihop neighbor parameter.

In addition, both routers need to have routes to the /30 interconnect subnets of the other router pair to connect to them. For simplicity and stability, this is achieved using static routes, over the dedicated link for iBGP:

csw1-pmtpa:

ip route 84.40.25.220 255.255.255.252 10.10.0.6
! Backup route
ip route 84.40.25.220 255.255.255.252 207.142.131.244 250
! Make sure we don't route this traffic externally if the internal links are down
ip route 84.40.25.220 255.255.255.252 Null0 251

csw4-pmtpa:

ip route 64.156.25.240 255.255.255.252 10.10.0.5
! Backup route
ip route 64.156.25.240 255.255.255.252 207.142.131.240 250
! Make sure we don't route this traffic externally if the internal links are down
ip route 64.156.25.240 255.255.255.252 Null0 251

The routers need to use the correct source address for their BGP sessions. The default is to use the one of the first outgoing interface, which is a problematic 10.10.* in this case. It can be statically selected with neighbor ... update-source.

The multihop eBGP sessions on each router are almost equal to the corresponding single hop ones. csw1-pmtpa:

router bgp 64600
  neighbor 84.40.25.221 remote-as 30217
  neighbor 84.40.25.221 description Multihop BGP session to PowerMedium

  ! For multihop
  neighbor 84.40.25.221 update-source GigabitEthernet0/50
  neighbor 84.40.25.221 ebgp-multihop 2

  ! Use soft reconfiguration for this peer, less disruptive updates
  neighbor 84.40.25.221 soft-reconfiguration inbound

  ! Fast failover
  neighbor 84.40.25.221 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 84.40.25.221 maximum-prefix 10

  ! Allow only routes that originate here to be announced
  neighbor 84.40.25.221 filter-list 2 out

  ! Also filter on IP prefixes for outgoing upgrades
  neighbor 84.40.25.221 prefix-list bgp-outfilter out

  ! Allow only an incoming default route prefix
  neighbor 84.40.25.221 prefix-list bgp-default in 

csw4-pmtpa:

router bgp 64600
  neighbor 64.156.25.241 remote-as 30217
  neighbor 64.156.25.241 description Multihop BGP session to PowerMedium

  ! For multihop
  neighbor 64.156.25.241 update-source GigabitEthernet0/52
  neighbor 64.156.25.241 ebgp-multihop 2

  ! Use soft reconfiguration for this peer, less disruptive updates
  neighbor 64.156.25.241 soft-reconfiguration inbound

  ! Fast failover
  neighbor 64.156.25.241 timers 10 30

  ! Close the session if we receive more than 10 prefixes, which shouldn't happen
  neighbor 64.156.25.241 maximum-prefix 10

  ! Allow only routes that originate here to be announced
  neighbor 64.156.25.241 filter-list 2 out

  ! Also filter on IP addresses for outgoing upgrades
  neighbor 64.156.25.241 prefix-list bgp-outfilter out

  ! Allow only incoming prefixes in prefix-list bgp-infilter
  neighbor 64.156.25.241 prefix-list bgp-default in

Once this is setup, both routers have two equal default routes each, and can employ multipath:

csw1-pmtpa#sh ip bgp 0.0.0.0
BGP routing table entry for 0.0.0.0/0, version 26
Paths: (3 available, best #1, table Default-IP-Routing-Table)
Multipath: eBGP
  Advertised to update-groups:
     1
  30217, (received & used)
    84.40.25.221 from 84.40.25.221 (84.40.24.249)
      Origin IGP, metric 0, localpref 100, valid, external, multipath, best
  30217, (received & used)
    10.10.0.6 from 10.10.0.6 (207.142.131.244)
      Origin IGP, metric 0, localpref 100, valid, internal
  30217, (received & used)
    64.156.25.241 from 64.156.25.241 (64.156.25.241)
      Origin IGP, localpref 100, valid, external, multipath
csw1-pmtpa#sh ip route

   [snip]

B*   0.0.0.0/0 [20/0] via 84.40.25.221, 01:36:53
               [20/0] via 64.156.25.241, 23:30:23

[edit] HSRP

In order to have first hop failover, i.e. failover of the default gateway of all hosts in the network, HSRP has been implemented on VLAN 1 and 2 on both routers, csw1-pmtpa and csw4-pmtpa. Both routers share a "virtual IP", the gateway IP that has been configured on all hosts. One of both routers is active for the IP, the other is in standby, taking over within seconds when the active router fails. Besides the virtual IP, each router has its own unique IP in the respective subnet.

VLAN 1 makes use of HSRP group 1, VLAN 2 uses HSRP group 2. In both cases csw1-pmtpa has the highest priority and is therefore the default router, but preemption is disabled, so the default router won't force itself to be the active router once it comes back up.

[edit] Configuration csw1-pmtpa

The relevant configuration bits are:

interface Vlan1
  description Public VLAN / interface
  ip address 207.142.131.240 255.255.255.192
  standby 1 ip 207.142.131.193
  standby 1 priority 150
end
interface Vlan2
  description Private VLAN: Apache
  ip address 10.0.0.201 255.255.0.0
  standby 2 ip 10.0.0.200
  standby 2 priority 150
end

[edit] Configuration csw4-pmtpa

The relevant configuration bits are:

interface Vlan1
  description Public VLAN
  ip address 207.142.131.244 255.255.255.192
  standby 1 ip 207.142.131.193
end
interface Vlan2
  description Private VLAN
  ip address 10.0.0.202 255.255.0.0
  standby 2 ip 10.0.0.200
end

[edit] External links

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox