BGP/old setup
(→iBGP: finish section) |
(→eBGP: finish) |
||
| Line 115: | Line 115: | ||
=== eBGP === | === eBGP === | ||
| + | To actually make failover work, each router is set up for an eBGP session to the PowerMedium router it's directly connected to. | ||
| + | |||
| + | csw1-pmtpa: | ||
| + | neighbor 64.156.25.241 remote-as 30217 | ||
| + | neighbor 64.156.25.241 description BGP session to PowerMedium | ||
| + | |||
| + | ! Use soft reconfiguration for this peer, less disruptive updates | ||
| + | neighbor 64.156.25.241 soft-reconfiguration inbound | ||
| + | |||
| + | ! Fast failover | ||
| + | neighbor 64.156.25.241 timers 10 30 | ||
| + | |||
| + | ! Allow only routes that originate here to be announced | ||
| + | neighbor 64.156.25.241 filter-list 2 out | ||
| + | |||
| + | ! Also filter on IP addresses for outgoing upgrades | ||
| + | neighbor 64.156.25.241 prefix-list bgp-outfilter out | ||
| + | |||
| + | ! Allow only incoming prefixes in prefix-list bgp-infilter | ||
| + | neighbor 64.156.25.241 prefix-list bgp-default in | ||
| + | |||
| + | csw4-pmtpa: | ||
| + | neighbor 84.40.25.221 remote-as 30217 | ||
| + | neighbor 84.40.25.221 description BGP session to PowerMedium | ||
| + | |||
| + | ! Use soft reconfiguration for this peer, less disruptive updates | ||
| + | neighbor 84.40.25.221 soft-reconfiguration inbound | ||
| + | |||
| + | ! Fast failover | ||
| + | neighbor 84.40.25.221 timers 10 30 | ||
| + | |||
| + | ! Allow only routes that originate here to be announced | ||
| + | neighbor 84.40.25.221 filter-list 2 out | ||
| + | |||
| + | ! Also filter on IP prefixes for outgoing upgrades | ||
| + | neighbor 84.40.25.221 prefix-list bgp-outfilter out | ||
| + | |||
| + | ! Allow only an incoming default route prefix | ||
| + | neighbor 84.40.25.221 prefix-list bgp-default in | ||
| + | |||
| + | ==== filter-list ==== | ||
| + | Announcement of prefixes are limited to prefixes having an empty ''AS path'', meaning that the prefix was not received from another AS, and thus must have originated here. This makes sure that we don't announce unintended routes and/or offer transit to peers: | ||
| + | |||
| + | ! BGP filter list. Send only prefixes with an empty AS | ||
| + | ! list to which our AS will be added, so routes have to | ||
| + | ! originate here | ||
| + | ip as-path access-list 2 permit ^$ | ||
| + | |||
| + | ==== distribute-list ==== | ||
| + | For incoming announcements, we just expect a default-route prefix for which the filter <tt>bgp-default</tt> can be used, as described above. Outgoing we want to announce our network prefixes: | ||
| + | |||
| + | ip prefix-list bgp-outfilter description BGP outbound filter | ||
| + | ip prefix-list bgp-outfilter seq 10 permit 207.142.131.192/26 | ||
| + | ip prefix-list bgp-outfilter seq 20 permit 84.40.25.224/27 | ||
=== Load sharing: multihop eBGP === | === Load sharing: multihop eBGP === | ||
Revision as of 19:04, 8 May 2006
The pmtpa cluster has L3 failover using two separate core routers, and 2 BGP links to upstream.
In order to overcome the disaster of a uplink failure, loss of a router (ours or upstream) and to increase our available bandwidth from 1 to 2 Gbit/s, we implemented failover and connection load sharing using the BGP protocol. Both our routers csw1-pmtpa and csw4-pmtpa have separate fiber connections to two PowerMedium routers.
This setup makes the network redundant and unaffected by the loss of either of the two PowerMedium routers. To make sure that the internal network is also unaffected, all downstream switches and hosts should be redundantly connected to both multilayer switches. HSRP is used to make internal routing redundant, by offering a virtual gateway failover IP to all hosts, that in reality is served by either of the two multilayer switches.
Contents |
Diagram
BGP
As we don't have our own AS number, we use one out of the range of reserved AS numbers for private use, AS 64600.
Synchronisation with IGPs (Interior Gateway Protocol) is turned off, as we don't have one. The router-id (used for identification within the BGP protocol) is set to 207.142.131.240 for csw1-pmtpa, and 207.142.131.244 for csw4-pmtpa. Logging of events concerning BGP peers/neighbors is turned on. Because we want failover to happen quickly in cases of link failures, the keepalive time is reduced to 5 seconds, and the BGP hold time to 15.
csw1-pmtpa:
router bgp 64600 no synchronization bgp router-id 207.142.131.240 bgp log-neighbor-changes timers bgp 5 15
csw4-pmtpa:
router bgp 64600 no synchronization bgp router-id 207.142.131.244 bgp log-neighbor-changes timers bgp 5 15
For both routers, the network prefixes that we shall announce are:
! Network prefixes to be announced ! Wikimedia network 207.142.131.192 mask 255.255.255.192 ! Wikia network 84.40.25.224 mask 255.255.255.224
iBGP
iBGP (internal BGP, BGP sessions within the same AS) is setup between the csw1-pmtpa and csw4-pmtpa so they can share eachother's routes. A dedicated 1 Gbit/s link is setup for this between the two multilayer switches with both ports in routed mode. The IP subnet used is 10.10.0.4/30.
csw1-pmtpa:
interface GigabitEthernet0/36 description iBGP link to csw4-pmtpa no switchport ip address 10.10.0.5 255.255.255.252
csw4-pmtpa:
interface GigabitEthernet0/47 description iBGP link to csw1-pmtpa no switchport ip address 10.10.0.6 255.255.255.252
On both routers, neighbor statements are added for an iBGP session between eachother:
csw1-pmtpa:
neighbor 10.10.0.6 remote-as 64600 neighbor 10.10.0.6 description iBGP session to csw4-pmtpa neighbor 10.10.0.6 soft-reconfiguration inbound ! Fast failover neighbor 10.10.0.6 timers 10 30 ! Originate a default route if we received one through BGP neighbor 10.10.0.6 default-originate route-map bgp-pm-default ! Only distribute and accept a default route neighbor 10.10.0.6 prefix-list bgp-default in neighbor 10.10.0.6 prefix-list bgp-default out ! Replace the next-hop address by our address to iBGP peers neighbor 10.10.0.6 next-hop-self
csw4-pmtpa:
neighbor 10.10.0.5 remote-as 64600 neighbor 10.10.0.5 description iBGP session to csw1-pmtpa neighbor 10.10.0.5 soft-reconfiguration inbound ! Fast failover neighbor 10.10.0.5 timers 10 30 ! Originate a default route if we received one through BGP neighbor 10.10.0.5 default-originate route-map bgp-pm-default ! Only distribute and accept a default route neighbor 10.10.0.5 prefix-list bgp-default in neighbor 10.10.0.5 prefix-list bgp-default out ! Replace the next-hop address by our address to iBGP peers neighbor 10.10.0.5 next-hop-self
default-originate
A default route, 0.0.0.0/0 is not announced over BGP by default. Therefore, an explicit neighbor ... default-originate is given so a default route is announced if the router has one in its routing table. However, it turns out that any default route gives rise to that behaviour, even if it's a staticly entered route, or one received from the same peer it is being announced to - giving routing loops. Therefore, we limit originating a default route only if the route was received over eBGP from AS 30217, PowerMedium, using a route-map bgp-pm-default:
! Route map match list, matching AS ^30217$ (PowerMedium originated) ip as-path access-list 1 permit ^30217$
! Route map as bgp originate-default conditional. Only originate ! a default route if we got it from AS 30217 (PowerMedium) route-map bgp-pm-default permit 10 match as-path 1
prefix filters
To make sure we don't actually announce or receive prefixes we don't expect, filters are used. iBGP in this setup is only used to distribute a default route, so the following prefix-list is used:
ip prefix-list bgp-default description BGP filter that allows just a default route ip prefix-list bgp-default seq 10 deny 0.0.0.0/0 ge 1 ip prefix-list bgp-default seq 20 permit 0.0.0.0/0
next-hop-self
Because a route announced over iBGP by default has a next hop field set to the IP address of the eBGP peer by which the route was announced, and the receiving router may not necessarily have a route to that address, the next hop field is set to the iBGP router's own address. Any iBGP session peer can reach that address by definition.
eBGP
To actually make failover work, each router is set up for an eBGP session to the PowerMedium router it's directly connected to.
csw1-pmtpa:
neighbor 64.156.25.241 remote-as 30217 neighbor 64.156.25.241 description BGP session to PowerMedium ! Use soft reconfiguration for this peer, less disruptive updates neighbor 64.156.25.241 soft-reconfiguration inbound ! Fast failover neighbor 64.156.25.241 timers 10 30 ! Allow only routes that originate here to be announced neighbor 64.156.25.241 filter-list 2 out ! Also filter on IP addresses for outgoing upgrades neighbor 64.156.25.241 prefix-list bgp-outfilter out ! Allow only incoming prefixes in prefix-list bgp-infilter neighbor 64.156.25.241 prefix-list bgp-default in
csw4-pmtpa:
neighbor 84.40.25.221 remote-as 30217 neighbor 84.40.25.221 description BGP session to PowerMedium ! Use soft reconfiguration for this peer, less disruptive updates neighbor 84.40.25.221 soft-reconfiguration inbound ! Fast failover neighbor 84.40.25.221 timers 10 30 ! Allow only routes that originate here to be announced neighbor 84.40.25.221 filter-list 2 out ! Also filter on IP prefixes for outgoing upgrades neighbor 84.40.25.221 prefix-list bgp-outfilter out ! Allow only an incoming default route prefix neighbor 84.40.25.221 prefix-list bgp-default in
filter-list
Announcement of prefixes are limited to prefixes having an empty AS path, meaning that the prefix was not received from another AS, and thus must have originated here. This makes sure that we don't announce unintended routes and/or offer transit to peers:
! BGP filter list. Send only prefixes with an empty AS ! list to which our AS will be added, so routes have to ! originate here ip as-path access-list 2 permit ^$
distribute-list
For incoming announcements, we just expect a default-route prefix for which the filter bgp-default can be used, as described above. Outgoing we want to announce our network prefixes:
ip prefix-list bgp-outfilter description BGP outbound filter ip prefix-list bgp-outfilter seq 10 permit 207.142.131.192/26 ip prefix-list bgp-outfilter seq 20 permit 84.40.25.224/27
Load sharing: multihop eBGP
HSRP
In order to have first hop failover, i.e. failover of the default gateway of all hosts in the network, HSRP has been implemented on VLAN 1 and 2 on both routers, csw1-pmtpa and csw4-pmtpa. Both routers share a "virtual IP", the gateway IP that has been configured on all hosts. One of both routers is active for the IP, the other is in standby, taking over within seconds when the active router fails. Besides the virtual IP, each router has its own unique IP in the respective subnet.
VLAN 1 makes use of HSRP group 1, VLAN 2 uses HSRP group 2. In both cases csw1-pmtpa has the highest priority and is therefore the default router, but preemption is disabled, so the default router won't force itself to be the active router once it comes back up.
Configuration csw1-pmtpa
The relevant configuration bits are:
interface Vlan1 description Public VLAN / interface ip address 207.142.131.240 255.255.255.192 standby 1 ip 207.142.131.193 standby 1 priority 150 end
interface Vlan2 description Private VLAN: Apache ip address 10.0.0.201 255.255.0.0 standby 2 ip 10.0.0.200 standby 2 priority 150 end
Configuration csw4-pmtpa
The relevant configuration bits are:
interface Vlan1 description Public VLAN ip address 207.142.131.244 255.255.255.192 standby 1 ip 207.142.131.193 end
interface Vlan2 description Private VLAN ip address 10.0.0.202 255.255.0.0 standby 2 ip 10.0.0.200 end
