Event logging
This article describes the network architecture for event tracking, or the recording of user interactions with web content for analytics purposes. An event is the anonymized record of a user action on the site, such as previewing an edit, or collapsing the "Toolbox" section in the sidebar. We capture and analyze event data in aggregate to better understand how readers and editors interact with our site, to identify usability problems, and to provide feedback for features engineers.
Contents |
TL;DR
Data flows in one direction: front ⇒ back.
- The browser captures a UI event as a key-value map.
- Browser requests
http[s]://bits.wikimedia.org/event.gif?key=val&foo=bar - Varnish recognizes URL, returns empty response, and logs request.
- On each Bits,
varnishlogfilters for/event.gif?requests, piping them intozpub. -
zpubuses ZeroMQ to publish event log data to subscribers on port 8006/tcp. - Subscribers store event log data for analysis.
Client-Side JavaScript
mw.experiments.track(). track() serializes each event into a percent-encoded query string, which it affixes to the base URL for event tracking (bits.wikimedia.org/event.gif). It then creates an <img> element and sets the full URL as its source, triggering an asynchronous request.
Varnish
/event.gif. Rather than attempt to resolve the request to an object, Varnish bombs out with an HTTP 204 ("No Content") response, indicating the request was received and understood but that no message-body is forthcoming (see RFC 2616, section 10.2.5). This response is semantically correct, widely supported by browsers old and new, and minimizes bandwidth and latency.
The behavior is specified using VCL ("Varnish Configuration Language") and is managed by Puppet (see source link above):
if (req.url ~ "^/event\.gif") { error 204; }
You can test this configuration easily using curl -i, or in-browser using
HTTP Viewer:
$ curl -i http://bits.wikimedia.org/event.gif?foo=bar HTTP/1.1 204 No Content Server: Varnish Accept-Ranges: bytes Date: Sat, 18 Aug 2012 05:23:59 GMT X-Varnish: 547318191 Age: 0 Via: 1.1 varnish Connection: close
Note that the request requires no cache or back-end look-ups: it is handled entirely within Varnish, with microsecond performance. Moreover, Varnish does not parse, aggregate or transport event data, and it has no direct knowledge of the components that do these things. All that Varnish has to do is record the request in its log, which is its default behavior.
Loose coupling provides a measure of safety and security for Varnish, isolating it from failures occurring in other subsystems. Varnish will not block client requests while waiting for log data to be collected. When its log buffer is exhausted, Varnish simply rewinds its pointer and starts writing from the top. It is the responsibility of interested subscribers to keep up.
A consequence of this approach is that errors occurring further back in the event tracking data pipeline cannot propagate back to the user. This is desirable, however, since event-tracking is orthogonal to the user's immediate interests and experience. Event tracking ought never interfere with or complicate the servers' primary function, which is to serve content.
The absence of forward dependencies (front-facing components needing to know what happens behind them) isolates failure and promotes modularity. It is possible to thoroughly alter how we aggregate or store event data without changing the configuration of Varnish. We are therefore free to develop these subsystems iteratively, in response to evolving needs, without compromising the availability of critical services.
VSM
Varnish assumes as little as possible about what log data you'll need and what you'll want to do with it. Rather than log to a file, Varnish logs to a shared memory segment. The mechanism and internal API for reading and writing from this memory segment is known as VSM ("Varnish Shared Memory"). The VSM log can be tailed using a suite of tools distributed with Varnish, the most useful of which are varnishlog, which simply streams the contents of the log buffer to standard output, and varnishncsa, which provides finer controls over the output format of Varnish's request log. Both varnishlog and varnishncsa can optionally filter for log records matching a particular pattern or type.
We can pass varnishncsa the following command-line arguments to stream tracked events:
$ /usr/bin/varnishncsa -m RxURL:^/event\.gif -F '%q'
The format placeholder "%q" is rendered as the request's query string (or an empty line). Output is unbuffered.
The output of this exact invocation is piped to another tool, zpub, which makes log data available as a streaming network service using ZeroMQ.
ZeroMQ
ZeroMQ is a free and open source networking library that lets you write brokerless message-oriented middleware using a socket-like API. It is performant, well-tested, and broadly used. It exposes common messaging patterns (like pub/sub, fanout, etc.) as different "types" of sockets. zpub is a small application that reads lines from stdin and streams them using a ZeroMQ publisher socket in a fan-out fashion to all connected peers.
A subscriber may identify itself to the publisher using a short identifier string called a socket identity. If the connection between a named subscriber and a publisher drops, the publisher will buffer messages for that subscriber until the subscriber re-connects or the number of messages in the buffer reaches the high-water mark (by default, 1,000 messages).
ZeroMQ's pub/sub pattern reproduces Varnish's message-oriented architecture on the network:
| Varnish | ZeroMQ | |
|---|---|---|
| Publishers | varnishd
|
zpub, many others
|
| Subscribers | varnishlog, varnishncsa, etc.
|
zsub, many others
|
| Transport | shared memory segment | network link |
| I/O | non-blocking, unidirectional | |
| Filters | tags, regexps | string prefix |
| Max. queue | SHM_WORKSPACE (default: 8 kB)
|
ZMQ_HWM (default: 1 k messages)
|
| Queue exhaustion | Silently overwrite | Silently drop |
TL;DR: varnishd : varnishlog :: zpub : zsub
zpub/sub has been packaged for both Lucid and Precise and is available in the Wikimedia repository (apt-get install zpubsub).
zpub usage
zpub performance may increase if it is refactored to read the VSM log directly using the public API (see include/vapi/vsm.h), rather than rely on the output of varnishlog or varnishncsa. The cost of this optimization would be the forfeiture of a supported and production-hardened tool in favor of a custom implementation. We should do this only if this a bottleneck.$ zpub -h zpub -- pipe lines from stdin to zmq pub socket Usage: zpub [options] -p <port> Server port (default: 8006) -i <interface> Interface to bind (default: *) -m <count> ZeroMQ high-water mark (default: 1000) -h Show this message and exit
zsub usage
$ zsub -h zsub -- pipe zmq sub socket to stdout Usage: zsub [options] host:port -h Show this message and exit -f <filter> Subscribe to filter -s <identity> Set socket identity
Debugging and Benchmarking
zpub/sub
The following script will stream a file in a loop using zpub, and listen to it using zsub. It uses pv (pipe viewer), which you may have to install.
#!/usr/bin/env bash
# zbench
#
# Benchmark throughput of ZeroMQ pub/sub over local TCP/IP
# socket.
#
# Usage: zbench [/dev/shm/inputfile]
# Reads input file in a loop; outputs LINES / second.
set -em
trap 'kill %% &>/dev/null' EXIT
[[ "$#" == 1 ]] || { echo >&2 "Usage: ${0} inputfile" ; exit 1 ; }
while :; do cat "$1" ; done | zpub &
zsub localhost:8006 | pv -Walr -N "Messages/sec (cur, avg)" >/dev/null
waitOn vanadium, a modest machine, results have ranged from 70k to 115k messages/sec, depending on load. Keep in mind that in production, a machine is only ever responsible for one role (publisher or subscriber), not both.
Varnish
Benchmark
To measure the amount of time Varnish is taking to service event tracking requests, run varnishlog with the following arguments and curl the URL:
$ varnishlog -I^/event\.gif -iReqEnd -iRxURL 11 SessionOpen c 75.101.52.39 57737 :80 11 RxURL c /event.gif 11 ReqEnd c 1886474086 1345278127.152711868 1345278127.152907133 0.000069857 0.000093460 0.000101805
Subtract the second number of the ReqEnd line from the third to get the total time (in decimal seconds) it took Varnish to service the request. (In this case, 195 microseconds.) For additional guidelines on interpreting the output of varnishlog, see the page Varnishlog tag explanation on the Varnish wiki.
Compile VCL
To see the C code a VCL file compiles to, run:
$ varnishd -d -f bits.inc.vcl -C
Useful Stats
Statistics about the shared memory segment can be viewed using varnishstat:
$ varnishstat -1 -f shm* shm_records 121368 0.67 SHM records shm_writes 120391 0.67 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 0 0.00 SHM MTX contention shm_cycles 0 0.00 SHM cycles through buffer
If the shm_flushes count is high, the size of the shared memory segment can be increased by upping the value of the shm_workspace run-time parameter, or by specifying it in the varnishd command-line, using -l. See section "Run-Time Parameters" in the varnishd documentation.
Further reading
Varnish
- VSM: Shared Memory Logging and Statistics
- Varnishlog tags
- Using VCL for Webmasters
- varnishlog bugs in trac
- Proof that varnishncsa doesn't buffer output