Multicast HTCP purging

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Networking changes)
(Update to remove obsolete information. Could use a knowledgable review)
 
(3 intermediate revisions by one user not shown)
Line 1: Line 1:
'''This page talks about proposal that was eventually implemented.''' It's kind of current, but the essay style makes for a very verbose documentation page.
+
''This page was heavily edited by RobLa on 2013-01-28, and could use a review by a knowledgable opsen''
  
{{old}}
+
'''Multicast HTCP purging''' is a method of [[Squid]] and [[Varnish]] purging by using [[w:multicast|multicast]] [[w:HTCP|HTCP]] packets.
  
'''Multicast HTCP purging''' is a new method of [[Squid]] purging by using [[w:multicast|multicast]] [[w:HTCP|HTCP]] packets.
+
== Request flow ==
  
== Why ==
+
MediaWiki instance in Eqiad detects that a purge is needed. It sends an HTCP purge request to a multicast group for each individual URI that needs to be purged.
Previous methods of Squid purging implemented in MediaWiki, SquidUpdate::purge and SquidUpdate::fastPurge, used HTTP PURGE requests over unicast TCP connections from all Apaches to all Squids. This had a few drawbacks:
+
* Native multicast routing is enabled in [[Eqiad]] and [[Pmtpa]], and multicast packets ''should'' natively route between the two datacenters
* All Apaches needed to be able to connect to all Squids
+
*  Multicast is sent to [[Esams]] via multicast->unicast->multicast relay located in Pmtpa (as of 2013-01-28)
* There was overhead of handling Squid's replies and TCP connection overhead
+
*  All Squids/Varnish caches subscribe to the multicast feed
The biggest drawback was that it was plain slow.
+
  
== Software modifications ==
+
Note, multicast HTCP is a one-way protocol, which means that requests are fired and forgotten.  If there is a problem anywhere in the system, the HTCP origin has no way of knowing there was a failure, and thus assumes that the request went through.
Someone came up with the idea to write a single Squid purge daemon, to which all Apaches could connect to, and send a single HTTP PURGE request. This daemon would then multiplex this message to all Squids.
+
  
I started thinking of implementing this, but then came up with a different idea: making use of multicast. By sending the purge requests to a specific multicast group to which all Squids can subscribe to, the network will take care of the multiplexing, which means that there is no need for a separate daemon, and it will be done very efficiently in hardware.
+
=== HTCP modifications to Squid ===
  
One problem of course, is that TCP is not suitable to be sent using multicast, as it requires two way communication. UDP however, does not (on itself). Furthermore, it turned out that one of the inter-cache protocols, [[w:HTCP|HTCP]] was designed with support for URI purging (HTCP ''CLR''). So, all we needed was HTCP client support in MediaWiki, with support for sending to multicast groups.
+
Mark Bergsma modified the HTCP support in Squid to do the following:
 
+
=== Squid ===
+
Of course, life was not that great. Squid's HTCP support turned out to be very incomplete at best, and even in direct violation of the RFC. It had no support for HTCP CLR at all. A patch to implement was quickly found via Google [http://www.squid-cache.org/mail-archive/squid-dev/200310/att-0011/htcp.c.diff], but turned out to have issues as well.
+
 
+
I modified the patch to
+
 
* work without requiring HTCP CLR responses
 
* work without requiring HTCP CLR responses
 
* work at all when not requesting HTCP CLR responses
 
* work at all when not requesting HTCP CLR responses
 
* use a different store searching algorithm instead of htcpCheckHit(), which was intended for finding cache entries for URI hits instead of URI purges
 
* use a different store searching algorithm instead of htcpCheckHit(), which was intended for finding cache entries for URI hits instead of URI purges
 
* allow the simultaneous removal of both HEAD and GET entries with a single HTCP request, by specifying ''NONE'' as the HTTP method
 
* allow the simultaneous removal of both HEAD and GET entries with a single HTCP request, by specifying ''NONE'' as the HTTP method
 +
 +
The Squids are all configured with the following line:
 +
mcast_groups 239.128.0.112
 +
 +
to have them join the relevant multicast group, and receive all the purge requests.
 +
 +
=== Varnish ===
 +
 +
Varnish relies on a separate listener daemon (varnishhtcpd) to listen for purge requests and respond to them.
  
 
=== MediaWiki ===
 
=== MediaWiki ===
 
MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see <tt>DefaultSettings.php</tt> to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.
 
MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see <tt>DefaultSettings.php</tt> to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.
  
Some profiling runs show that the new method is about '''8000 times''' faster than the older fastPurge method.
+
All Apaches are configured through <tt>CommonSettings.php</tt> to send HTCP purge requests to the multicast group address '''239.128.0.112'''. It uses multicast [[w:Time to live|Time To Live]] '''2''' (instead of the default, 1) because the messages need to cross a single subnet/router.
  
== Networking changes ==
+
=== udpmcast relay ===
Of course the network had to be set up to support multicast, especially as all the Apaches and Squids are not on the same subnet. The Florida network has now been configured to route multicast, which seems to work reliably.
+
'''udpmcast''' is a small application level multicast tool in Python, . It joins a given multicast group on startup, listens on a specified UDP port, and then forwards all received packets to a given set of (unicast or multicast) destinations.
  
Getting the multicast packets routed to other clusters over the world turned out to be tough. Native multicast routing is not really an option, as most networks (including the ISP's in Florida) don't support it. Tunneling is, but didn't really work reliably in tests, and also might involve the use of non-free software.
+
Its options can be found by running it with the <tt>-h</tt> argument.
  
=== updmcast ===
+
As of January 2013, dobson is running udpmcast via /etc/rc.local and sending to hooft. Group is 239.128.0.112 port 4827.
Therefor, I wrote a small application level multicast tool in Python, '''udpmcast'''. It joins a given multicast group on startup, listens on a specified UDP port, and then forwards all received packets to a given set of (unicast or multicast) destinations.
+
  
The program can be found in the MediaWiki CVS repository. Its usage is quite straightforward, and its options can be found by running it with the <tt>-h</tt> argument.
+
udpmcast.py supports ''forwarding rules'', where it selects the destination address list based on the ''source address'' that sent the packet. These forward rules can be specified as a ''Python dictionary'' on the command line.
  
=== multicast breakage troubleshooting ===
+
=== Multicast breakage troubleshooting ===
 
(current as of January) 2013
 
(current as of January) 2013
 +
 
this is for troubleshooting the udp multicast to unicast proxy that enables purges to work in esams
 
this is for troubleshooting the udp multicast to unicast proxy that enables purges to work in esams
 +
 
first tcpdump on dobson  
 
first tcpdump on dobson  
tcpdump -n -v udp port 4827 and host 239.128.0.112
+
tcpdump -n -v udp port 4827 and host 239.128.0.112
is there a crazy amount of traffic?  if yes, it's not the network on the US side! if no, it's the network on the US side.
+
 
 +
Is there a crazy amount of traffic?  if yes, it's not the network on the US side! if no, it's the network on the US side.
  
 
if there is a lot of traffic then tcpdump on hooft
 
if there is a lot of traffic then tcpdump on hooft
tcpdump -n -v udp port 4827 and host 239.128.0.112
+
tcpdump -n -v udp port 4827 and host 208.80.152.173
  
== Current setup ==
+
Do you see a huge amount of traffic? If yes - it's not the network!  Let's say that dobson has no traffic.
  
As of January 2013, dobson is running udpmcast via /etc/rc.local and sending to hooft. Group is 239.128.0.112 port 4827.  
+
After that making sure it is listening  -
 +
root@dobson:/var/log# netstat -nl | grep 4827
 +
udp        0      0 0.0.0.0:4827            0.0.0.0:*
  
The MediaWiki patches have been applied to MediaWiki CVS, both in ''HEAD'' and ''REL1_4'', and are live on the site. All Apaches are configured through <tt>CommonSettings.php</tt> to send HTCP purge requests to the multicast group address '''239.128.0.112'''. It uses multicast [[w:Time to live|Time To Live]] '''2''' (instead of the default, 1) because the messages need to cross a single subnet/router.
+
Then let's check and see if dobson can get multicast traffic on the correct group. Start iperf on dobson.  
  
The Florida squids have all been patched with my HTCP patch, and use this new method of purging. The French squids will follow soon. Until all Squids have been converted, the old HTTP purge method is still active for these caches. The only change in configuration on the Squids, besides having them listen on the HTCP port, is:
+
iperf -s -B 239.128.0.112 -u -p 1337 -i 5
  
mcast_groups 239.128.0.112
+
Then go to a varnish machine (like cp1041) and start up iperf
  
to have them join the relevant multicast group, and receive all the purge requests.
+
iperf -c 239.128.0.112 -b 50K -t 300 -T 5 -u -p 1337 -i 5
  
''udpmcast'' has been set up to forward all multicast HTCP squid purge requests to the three French squids individually, over unicast. It runs on [[dobson]], because that's an external host that doesn't run Squid, and therefore doesn't have conflicts on binding HTCP port 4827:
+
Notice the port is NOT one used by a real service. This is important. 
  
./udpmcast.py -dj 239.128.0.112 212.85.150.133 212.85.150.132 212.85.150.131
+
You should see output on dobson like
  
=== Forwarding rules ===
+
root@cp1044:~# iperf -s -B 239.128.0.112 -u -p 8648 -i 5
Starting with version 1.5, udpmcast.py supports ''forwarding rules'', where it selects the destination address list based on the ''source address'' that sent the packet. These forward rules can be specified as a ''Python dictionary'' on the command line. This is useful to support HTCP purge streams in both directions, for example between the pmtpa and yaseo clusters.
+
------------------------------------------------------------
 +
Server listening on UDP port 1337
 +
Binding to local address 239.128.0.112
 +
Joining multicast group  239.128.0.112
 +
Receiving 1470 byte datagrams
 +
UDP buffer size:  122 KByte (default)
 +
------------------------------------------------------------
 +
[  3] local 239.128.0.112 port 1337 connected with 10.64.0.169 port 8442
 +
[ ID] Interval      Transfer    Bandwidth      Jitter  Lost/Total Datagrams
 +
[  3]  0.0- 5.0 sec  30.1 KBytes  49.4 Kbits/sec  0.038 ms    0/  21 (0%)
 +
[  3]  5.0-10.0 sec  30.1 KBytes  49.4 Kbits/sec  0.025 ms    0/  21 (0%)
 +
[  3] 10.0-15.0 sec  30.1 KBytes  49.4 Kbits/sec  0.023 ms    0/  21 (0%)
  
On [[dobson]], it's configured as:
 
/usr/local/bin/udpmcast.py -u nobody -dj 239.128.0.112 "{ '211.115.107.158': ['239.128.0.112'] }" \
 
212.85.150.133 212.85.150.132 212.85.150.131 62.18.16.25 145.97.39.130 211.115.107.158
 
  
This means: "If the UDP packet was sent from <tt>211.115.107.158</tt> ([[amaryllis]]), send it to multicast group <tt>239.128.0.112</tt>. If not, send it to the default address list.
+
If you do not, multicast has gone wrong.
 +
 
 +
Try this step over again but change the group address (like to 239.128.0.115). If this still does not work, multicast is broken between datacenters.  If this does work, the multicast forwarding table has gone bad again.  You can fix it by disabling pim for a minute, however you should probably call juniper and report the bug.
 +
 
 +
== History ==
 +
Previous methods of Squid purging implemented in MediaWiki, SquidUpdate::purge and SquidUpdate::fastPurge, used HTTP PURGE requests over unicast TCP connections from all Apaches to all Squids. This had a few drawbacks:
 +
* All Apaches needed to be able to connect to all Squids
 +
* There was overhead of handling Squid's replies and TCP connection overhead
 +
The biggest drawback was that it was plain slow.  Some profiling runs show that the current method is about '''8000 times''' faster than the older fastPurge method.
  
 
== External links ==
 
== External links ==

Latest revision as of 19:27, 28 January 2013

This page was heavily edited by RobLa on 2013-01-28, and could use a review by a knowledgable opsen

Multicast HTCP purging is a method of Squid and Varnish purging by using multicast HTCP packets.

Contents

[edit] Request flow

  • MediaWiki instance in Eqiad detects that a purge is needed. It sends an HTCP purge request to a multicast group for each individual URI that needs to be purged.
  • Native multicast routing is enabled in Eqiad and Pmtpa, and multicast packets should natively route between the two datacenters
  • Multicast is sent to Esams via multicast->unicast->multicast relay located in Pmtpa (as of 2013-01-28)
  • All Squids/Varnish caches subscribe to the multicast feed

Note, multicast HTCP is a one-way protocol, which means that requests are fired and forgotten. If there is a problem anywhere in the system, the HTCP origin has no way of knowing there was a failure, and thus assumes that the request went through.

[edit] HTCP modifications to Squid

Mark Bergsma modified the HTCP support in Squid to do the following:

  • work without requiring HTCP CLR responses
  • work at all when not requesting HTCP CLR responses
  • use a different store searching algorithm instead of htcpCheckHit(), which was intended for finding cache entries for URI hits instead of URI purges
  • allow the simultaneous removal of both HEAD and GET entries with a single HTCP request, by specifying NONE as the HTTP method

The Squids are all configured with the following line:

mcast_groups 239.128.0.112

to have them join the relevant multicast group, and receive all the purge requests.

[edit] Varnish

Varnish relies on a separate listener daemon (varnishhtcpd) to listen for purge requests and respond to them.

[edit] MediaWiki

MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see DefaultSettings.php to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.

All Apaches are configured through CommonSettings.php to send HTCP purge requests to the multicast group address 239.128.0.112. It uses multicast Time To Live 2 (instead of the default, 1) because the messages need to cross a single subnet/router.

[edit] udpmcast relay

udpmcast is a small application level multicast tool in Python, . It joins a given multicast group on startup, listens on a specified UDP port, and then forwards all received packets to a given set of (unicast or multicast) destinations.

Its options can be found by running it with the -h argument.

As of January 2013, dobson is running udpmcast via /etc/rc.local and sending to hooft. Group is 239.128.0.112 port 4827.

udpmcast.py supports forwarding rules, where it selects the destination address list based on the source address that sent the packet. These forward rules can be specified as a Python dictionary on the command line.

[edit] Multicast breakage troubleshooting

(current as of January) 2013

this is for troubleshooting the udp multicast to unicast proxy that enables purges to work in esams

first tcpdump on dobson

tcpdump -n -v udp port 4827 and host 239.128.0.112

Is there a crazy amount of traffic? if yes, it's not the network on the US side! if no, it's the network on the US side.

if there is a lot of traffic then tcpdump on hooft

tcpdump -n -v udp port 4827 and host 208.80.152.173

Do you see a huge amount of traffic? If yes - it's not the network! Let's say that dobson has no traffic.

After that making sure it is listening -

root@dobson:/var/log# netstat -nl | grep 4827
udp        0      0 0.0.0.0:4827            0.0.0.0:*

Then let's check and see if dobson can get multicast traffic on the correct group. Start iperf on dobson.

iperf -s -B 239.128.0.112 -u -p 1337 -i 5

Then go to a varnish machine (like cp1041) and start up iperf

iperf -c 239.128.0.112 -b 50K -t 300 -T 5 -u -p 1337 -i 5

Notice the port is NOT one used by a real service. This is important.

You should see output on dobson like

root@cp1044:~# iperf -s -B 239.128.0.112 -u -p 8648 -i 5
------------------------------------------------------------
Server listening on UDP port 1337
Binding to local address 239.128.0.112
Joining multicast group  239.128.0.112
Receiving 1470 byte datagrams
UDP buffer size:   122 KByte (default)
------------------------------------------------------------
[  3] local 239.128.0.112 port 1337 connected with 10.64.0.169 port 8442
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0- 5.0 sec  30.1 KBytes  49.4 Kbits/sec  0.038 ms    0/   21 (0%)
[  3]  5.0-10.0 sec  30.1 KBytes  49.4 Kbits/sec  0.025 ms    0/   21 (0%)
[  3] 10.0-15.0 sec  30.1 KBytes  49.4 Kbits/sec  0.023 ms    0/   21 (0%)


If you do not, multicast has gone wrong.

Try this step over again but change the group address (like to 239.128.0.115). If this still does not work, multicast is broken between datacenters. If this does work, the multicast forwarding table has gone bad again. You can fix it by disabling pim for a minute, however you should probably call juniper and report the bug.

[edit] History

Previous methods of Squid purging implemented in MediaWiki, SquidUpdate::purge and SquidUpdate::fastPurge, used HTTP PURGE requests over unicast TCP connections from all Apaches to all Squids. This had a few drawbacks:

  • All Apaches needed to be able to connect to all Squids
  • There was overhead of handling Squid's replies and TCP connection overhead

The biggest drawback was that it was plain slow. Some profiling runs show that the current method is about 8000 times faster than the older fastPurge method.

[edit] External links

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox