Network design

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Inflexible internal network setup)
(Failover default routing using BGP)
Line 23: Line 23:
 
The Wikimedia network was recently split in two parts: the ''external'', publicly visible network containing machines that need to be accessed from the Internet (the Squids, mostly), and an ''internal'' network for machines that are only accessed by other wikimedia servers (Apaches, DB servers, management devices). Some servers, like the Squids, need to be in both networks because they serve as gateways between the Internet and the internal machines.
 
The Wikimedia network was recently split in two parts: the ''external'', publicly visible network containing machines that need to be accessed from the Internet (the Squids, mostly), and an ''internal'' network for machines that are only accessed by other wikimedia servers (Apaches, DB servers, management devices). Some servers, like the Squids, need to be in both networks because they serve as gateways between the Internet and the internal machines.
  
The internal network is currently implemented as a physically separate switch. This switch is not connected to the other two, and the only paths to the external network are through the servers that are on both networks. These, however, don't route traffic. These servers use separate interfaces to connect to the different networks (<tt>eth0</tt> for internal, <tt>eth1</tt> for external).
+
The internal network is currently implemented as a physically separate switch. This switch is not connected to the other two, and the only paths to the external network are through the servers that are on both networks. These servers use separate interfaces to connect to the different networks (<tt>eth0</tt> for internal, <tt>eth1</tt> for external).
  
 
Using physically separate switches for different networks is inflexible. This design  does not permit efficient use of resources like switch ports and bandwidth. It requires extra switches when the internal network is full, even if the switches for the external network have plenty of ports free. Even the currently used switches support [[Wikipedia:Virtual LAN|VLANs]] (including '''802.1Q''') and all of its advantages, so it would be good to use them.
 
Using physically separate switches for different networks is inflexible. This design  does not permit efficient use of resources like switch ports and bandwidth. It requires extra switches when the internal network is full, even if the switches for the external network have plenty of ports free. Even the currently used switches support [[Wikipedia:Virtual LAN|VLANs]] (including '''802.1Q''') and all of its advantages, so it would be good to use them.
  
 
:Plan is to switch to a VLAN once we find out what's connected to each switch port - Kate
 
:Plan is to switch to a VLAN once we find out what's connected to each switch port - Kate
 +
 +
=== Failover default routing using BGP ===
 +
Because the internal servers are not directly connected to the Internet, both Zwinger and Albert are setup to ''Source NAT'' traffic originated by these internal servers, to allow them to access Internet servers for management purposes.
 +
 +
Two hosts are configured as routers, to provide failover support. This, however, is done using [[Wikipedia:BGP|BGP]] and [[Wikipedia:Quagga|Quagga]] on all boxes. This seems to be a bit excessive, as better and easier solutions exist for this job: [[Wikipedia:VRRP|VRRP]] and [[Wikipedia:Common Address Redundancy Protocol|Common Address Redundancy Protocol]]. These solutions only need to be implemented on the routers, and don't require complicated daemons and protocols run on each host.
  
 
=== Limited switch features ===
 
=== Limited switch features ===

Revision as of 16:42, 23 October 2004

The purpose of this page is to give an overview of the current design of the network of the Wikimedia servers, and to provide a place to develop a new and improved network scheme.

Contents

Current situation

Wikimedia servers reside in two racks along with Bomis servers, hosted at Candidhosting. Wikimedia/Bomis have a dedicated IP range, 207.142.131.192/26. There are two gateways: 207.142.131.193 and 207.142.131.225, but they both resolve to the same MAC address, so they are almost certainly the same router. Total burstable bandwidth is 200 Mbit/s, delivered through two separate 100BaseTx uplinks, connected from the same broadcast domain that is shared with other customers.

Wikimedia owns three switches. As the two uplinks are not allowed to create a loop, they must be connected to different switches that are not connected to eachother (when not using STP), which is not an ideal situation. A third switch is currently used to connect internal servers, that don't have public IPs and should not be accessible from the Internet. The IP range used for this internal network is 10.0.0.0/8.

Problems

The current network setup is not optimal in many ways, as will be described here.

Multiple uplinks

Recently, Wikimedia traffic spiked to 100Mbit/s multiple times, which is the limit of a single 100BaseTx connection. Also, average outgoing traffic at this moment is about 45 Mbit/s, so it is clear that Wikimedia was slowly becoming network limited. However, the colo provider charges $400 dollar per month just to provide us with a Gigabit uplink, unless we commit to 60 Mbit/s average traffic or higher. Instead, they decided to give us a second 100BaseTx for free.

This does pose some problems though. Because the two uplinks are connected from the same broadcast domain, we cannot connect them internally, or we would create a loop. One solution to this problem is to connect the uplinks to different switches that are not connected, but this means that hosts on the two different switches can only exchange traffic between eachother through the uplinks. This traffic is graphed and billed twice, and is a bottleneck, as it has to traverse both relatively slow uplinks.

Shared broadcast domain

It appears that, even though Wikimedia has a dedicated IP range, the broadcast domain is shared with other customers. Running tethereal shows a lot of non-wikipedia traffic. It's odd that Wikipedia doesn't have it's own broadcast domain (probably implemented as a separate VLAN at the upstream provider), as there doesn't seem to be a reason not to.

Within a shared broadcast domain, other customers can snoop Wikimedia traffic, spoof our IPs, and cause unnecessary traffic through our uplinks.

Inflexible internal network setup

The Wikimedia network was recently split in two parts: the external, publicly visible network containing machines that need to be accessed from the Internet (the Squids, mostly), and an internal network for machines that are only accessed by other wikimedia servers (Apaches, DB servers, management devices). Some servers, like the Squids, need to be in both networks because they serve as gateways between the Internet and the internal machines.

The internal network is currently implemented as a physically separate switch. This switch is not connected to the other two, and the only paths to the external network are through the servers that are on both networks. These servers use separate interfaces to connect to the different networks (eth0 for internal, eth1 for external).

Using physically separate switches for different networks is inflexible. This design does not permit efficient use of resources like switch ports and bandwidth. It requires extra switches when the internal network is full, even if the switches for the external network have plenty of ports free. Even the currently used switches support VLANs (including 802.1Q) and all of its advantages, so it would be good to use them.

Plan is to switch to a VLAN once we find out what's connected to each switch port - Kate

Failover default routing using BGP

Because the internal servers are not directly connected to the Internet, both Zwinger and Albert are setup to Source NAT traffic originated by these internal servers, to allow them to access Internet servers for management purposes.

Two hosts are configured as routers, to provide failover support. This, however, is done using BGP and Quagga on all boxes. This seems to be a bit excessive, as better and easier solutions exist for this job: VRRP and Common Address Redundancy Protocol. These solutions only need to be implemented on the routers, and don't require complicated daemons and protocols run on each host.

Limited switch features

Proposed solutions

Proposed design

-- Mark 15:46, 22 Oct 2004 (UTC)

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox