Event logging
This article describes the network architecture for event tracking, or the recording of user interactions with web content for analytics purposes. A single event is the anonymized record of a user action on the site, such as previewing an edit, or collapsing the Toolbox widget in the sidebar. We capture and analyze event data in aggregate to better understand how readers and editors interact with our site, to identify usability problems, and to provide feedback for features engineers.
The behavior and implementation of event tracking is described in detail below.
Contents |
TL;DR
- Browser captures UI event as a key-value map.
- Browser requests http://bits.wikimedia.org/event.gif?key=val&foo=bar.
- Varnish recognizes URL, returns empty response, and logs request.
- On each Bits,
varnishlogfilters for/event.gif?requests, piping them intozpub. -
zpubuses ZeroMQ to publish event log data to subscribers on port 8006/tcp. - Subscribers store event log data for analysis.
Client-Side JavaScript
mw.experiments.track(). track() serializes them them into a percent-encoded query string along with some universal keys (a timestamp and an anonymous token). It appends this query string to a request to a special URL (http://bits.wikimedia.org/event.gif), which the browser requests asynchronously.
Varnish
/event.gif. Rather than attempt to resolve the request to an object, Varnish bombs out with an HTTP 204 ("No Content") response, indicating the request was received and understood but that no message-body is forthcoming (see RFC 2616, section 10.2.5). This response is semantically correct, widely supported by browsers old and new, and minimizes bandwidth and latency.
The behavior is specified using VCL ("Varnish Configuration Language") and is managed by Puppet (see source link above):
if (req.url ~ "^/event\.gif") { error 204; }
You can test this configuration easily using curl -i, or in-browser using
HTTP Viewer:
$ curl -i http://bits.wikimedia.org/event.gif?foo=bar HTTP/1.1 204 No Content Server: Varnish Accept-Ranges: bytes Date: Sat, 18 Aug 2012 05:23:59 GMT X-Varnish: 547318191 Age: 0 Via: 1.1 varnish Connection: close X-Cache: strontium miss (0)
Note that the request is handled entirely within Varnish and results in no cache look-ups or back-end connections. Moreover, Varnish is not expected to parse, aggregate or transport event data. All Varnish has to do is log the request in its transaction log, which is its default behavior. This approach exemplifies a major design principle of event tracking, which is that event tracking ought never interfere with or complicate the servers' primary function, which is to serve content. This design also exhibits loose coupling: Varnish is not expected to know about other components of event tracking, and is therefore not affected by their failure. As a consequence, errors occurring further back in the event tracking data pipeline cannot propagate back to the user. This is desirable, however, since event-tracking is orthogonal to the user's immediate interests and experience.
Varnish attempts to assume as little as possible about what log data you'll need and what you'll want to do with it. Rather than log to a file, Varnish logs to a shared memory segment. The SHM log can be tailed using a suite of tools distributed with Varnish, the most useful of which is varnishlog, which simply streams the contents of the log buffer to standard output. It can be configured via command-line arguments to filter for records by pattern or type.
The following invocation of varnishlog will stream the full URL of event tracking requests:
$ /usr/bin/varnishlog -Ou -I^/event\.gif -iRxURL
The output of this exact invocation is piped to another tool, zpub, which makes log data available as a streaming network service using ZeroMQ.
ZeroMQ
Template:Hatnote
ZeroMQ is a free and open source networking library that lets you write brokerless message-oriented middleware using a socket-like API. It is performant, well-tested, and broadly used. It exposes common messaging patterns (like pub/sub, fanout, etc.) as different "types" of sockets. zpub is a small application that reads lines from stdin and streams them using a ZeroMQ publisher socket as its transport layer.
zpub/sub has been packaged for both Lucid and Precise and is available in the Wikimedia repository (apt-get install zpubsub).
zpub usage
ZMQ_PUB socket enters an exceptional state due to having reached the high water mark for a subscriber, then any messages that would be sent to the subscriber in question shall instead be dropped until the exceptional state ends.
$ zpub -h zpub -- pipe lines from stdin to zmq pub socket Usage: zpub [options] -p <port> Server port (default: 8006) -i <interface> Interface to bind (default: *) -m <count> ZeroMQ high-water mark (default: 1000) -h Show this message and exit
zsub usage
$ zsub -h zsub -- pipe zmq sub socket to stdout Usage: zsub [options] host:port -h Show this message and exit -f <filter> Subscribe to filter -s <identity> Set socket identity
Debugging and Benchmarking
zpub/sub
The following script will stream a file in a loop using zpub, and listen to it using zsub. It uses pv (pipe viewer), which you may have to install.
#!/usr/bin/env bash
# zbench
#
# Benchmark throughput of ZeroMQ pub/sub over local TCP/IP
# socket.
#
# Usage: zbench [/dev/shm/inputfile]
# Reads input file in a loop; outputs LINES / second.
set -em
trap 'kill %% &>/dev/null' EXIT
[[ "$#" == 1 ]] || { echo >&2 "Usage: ${0} inputfile" ; exit 1 ; }
while :; do cat "$1" ; done | zpub &
zsub localhost:8006 | pv -Walr -N "Messages/sec (cur, avg)" >/dev/null
waitOn vanadium, a modest machine, results have ranged from 70k to 115k messages/sec, depending on load. Keep in mind that in production, a machine is only ever responsible for one role (publisher or subscriber), not both.
Varnish
Benchmark
To measure the amount of time Varnish is taking to service event tracking requests, run varnishlog with the following arguments and curl the URL:
$ varnishlog -I^/event\.gif -iReqEnd -iRxURL 11 SessionOpen c 75.101.52.39 57737 :80 11 RxURL c /event.gif 11 ReqEnd c 1886474086 1345278127.152711868 1345278127.152907133 0.000069857 0.000093460 0.000101805
Subtract the second number of the ReqEnd line from the third to get the total time (in decimal seconds) it took Varnish to service the request. (In this case, 195 microseconds.) For additional guidelines on interpreting the output of varnishlog, see the page Varnishlog tag explanation on the Varnish wiki.
Compile VCL
To see the C code a VCL file compiles to, run:
$ varnishd -d -f bits.inc.vcl -C
Useful Stats
Statistics about the shared memory segment can be viewed using varnishstat:
$ varnishstat -1 -f shm* shm_records 121368 0.67 SHM records shm_writes 120391 0.67 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 0 0.00 SHM MTX contention shm_cycles 0 0.00 SHM cycles through buffer
If the shm_flushes count is high, the size of the shared memory segment can be increased by upping the value of the shm_workspace run-time parameter, or by specifying it in the varnishd command-line, using -l. See section "Run-Time Parameters" in the varnishd documentation.