Event logging

From Wikitech
Revision as of 04:15, 27 August 2012 by Ori.livneh (Talk | contribs)

Jump to: navigation, search
Event tracking sequence diagram

This article describes the network architecture for event tracking, or the recording of user interactions with web content for analytics purposes. A single event is the anonymized record of a user action on the site, such as previewing an edit, or collapsing the Toolbox widget in the sidebar. We capture and analyze event data in aggregate to better understand how readers and editors interact with our site, to identify usability problems, and to provide feedback for features engineers.

Contents

TL;DR

  • Browser captures UI event as a key-value map.
  • Browser requests http://bits.wikimedia.org/event.gif?key=val&foo=bar.
  • Varnish recognizes URL, returns empty response, and logs request.
  • On each Bits, varnishlog filters for /event.gif? requests, piping them into zpub.
  • zpub uses ZeroMQ to publish event log data to subscribers on port 8006/tcp.
  • Subscribers store event log data for analysis.

Client-Side JavaScript

Source code: lib/track.js
Many user interactions do not result in page navigation and thus cannot be gleaned from the servers' request logs. To record these interactions, we rely on client-side JavaScript code: JavaScript event handlers represent each interaction event of interest as a simple key-value map and pass it to mw.experiments.track(). track() serializes them them into a percent-encoded query string along with some universal keys (a timestamp and an anonymous token). It appends this query string to a request to a special URL (http://bits.wikimedia.org/event.gif), which the browser requests asynchronously.

Varnish

The Varnish configuration for bits machines recognizes (and special-cases) requests for /event.gif. Rather than attempt to resolve the request to an object, Varnish bombs out with an HTTP 204 ("No Content") response, indicating the request was received and understood but that no message-body is forthcoming (see RFC 2616, section 10.2.5). This response is semantically correct, widely supported by browsers old and new, and minimizes bandwidth and latency.

The behavior is specified using VCL ("Varnish Configuration Language") and is managed by Puppet (see source link above):

if (req.url ~ "^/event\.gif") {
        error 204;
}

You can test this configuration easily using curl -i, or in-browser using HTTP Viewer:

$ curl -i http://bits.wikimedia.org/event.gif?foo=bar
HTTP/1.1 204 No Content
Server: Varnish
Accept-Ranges: bytes
Date: Sat, 18 Aug 2012 05:23:59 GMT
X-Varnish: 547318191
Age: 0
Via: 1.1 varnish
Connection: close

Note that the request requires no cache or back-end look-ups; it is handled entirely within Varnish, with microsecond performance. Moreover, Varnish is not responsible for parsing, aggregating or transporting event data. All Varnish must do is record the request in its log, which is its default behavior.

This approach insulates Varnish from failures and vulnerabilities occurring elsewhere in the event data pipeline, reflecting the principle that event tracking ought never interfere with or complicate the servers' primary function, which is to serve content.

This design also exhibits loose coupling: Varnish is not expected to know about other components of event tracking, and is therefore not affected by their failure. As a consequence, errors occurring further back in the event tracking data pipeline cannot propagate back to the user. This is desirable, however, since event-tracking is orthogonal to the user's immediate interests and experience.

Varnish assumes as little as possible about what log data you'll need and what you'll want to do with it. Rather than log to a file, Varnish logs to a shared memory file. The mechanism and internal API for reading and writing from this shared memory file is known as VSM ("Varnish Shared Memory"). The SHM log can be tailed using a suite of tools distributed with Varnish, the most useful of which is varnishlog, which simply streams the contents of the log buffer to standard output. It can be configured via command-line arguments to filter for records by pattern or type.

The following invocation of varnishlog will stream the full URL of event tracking requests:

  $ /usr/bin/varnishlog -Ou -I^/event\.gif -iRxURL

The output of this exact invocation is piped to another tool, zpub, which makes log data available as a streaming network service using ZeroMQ.

ZeroMQ

Template:Hatnote ZeroMQ is a free and open source networking library that lets you write brokerless message-oriented middleware using a socket-like API. It is performant, well-tested, and broadly used. It exposes common messaging patterns (like pub/sub, fanout, etc.) as different "types" of sockets. zpub is a small application that reads lines from stdin and streams them using a ZeroMQ publisher socket as its transport layer.

zpub/sub has been packaged for both Lucid and Precise and is available in the Wikimedia repository (apt-get install zpubsub).

zpub usage

Source code: zpubsub/zpub.c
Messages sent are distributed in a fan out fashion to all connected peers. When a ZMQ_PUB socket enters an exceptional state due to having reached the high water mark for a subscriber, then any messages that would be sent to the subscriber in question shall instead be dropped until the exceptional state ends.
$ zpub -h
zpub -- pipe lines from stdin to zmq pub socket
 
Usage: zpub [options]
  -p <port>        Server port (default: 8006)
  -i <interface>   Interface to bind (default: *)
  -m <count>       ZeroMQ high-water mark (default: 1000)
  -h               Show this message and exit

zsub usage

Source code: zpubsub/zsub.c
Used by a subscriber to subscribe to data distributed by a publisher.
$ zsub -h
zsub -- pipe zmq sub socket to stdout
 
Usage: zsub [options] host:port
  -h               Show this message and exit
  -f <filter>      Subscribe to filter
  -s <identity>    Set socket identity

Debugging and Benchmarking

zpub/sub

The following script will stream a file in a loop using zpub, and listen to it using zsub. It uses pv (pipe viewer), which you may have to install.

#!/usr/bin/env bash
 
# zbench
#
# Benchmark throughput of ZeroMQ pub/sub over local TCP/IP
# socket.
#
# Usage: zbench [/dev/shm/inputfile]
# Reads input file in a loop; outputs LINES / second.
 
set -em
trap 'kill %% &>/dev/null' EXIT
 
[[ "$#" == 1 ]] || { echo >&2 "Usage: ${0} inputfile" ; exit 1 ; }
 
while :; do cat "$1" ; done | zpub &
    zsub localhost:8006 | pv -Walr -N "Messages/sec (cur, avg)" >/dev/null
wait

On vanadium, a modest machine, results have ranged from 70k to 115k messages/sec, depending on load. Keep in mind that in production, a machine is only ever responsible for one role (publisher or subscriber), not both.

Varnish

Benchmark

To measure the amount of time Varnish is taking to service event tracking requests, run varnishlog with the following arguments and curl the URL:

$ varnishlog -I^/event\.gif -iReqEnd -iRxURL
   11 SessionOpen  c 75.101.52.39 57737 :80
   11 RxURL        c /event.gif
   11 ReqEnd       c 1886474086 1345278127.152711868 1345278127.152907133 0.000069857 0.000093460 0.000101805

Subtract the second number of the ReqEnd line from the third to get the total time (in decimal seconds) it took Varnish to service the request. (In this case, 195 microseconds.) For additional guidelines on interpreting the output of varnishlog, see the page Varnishlog tag explanation on the Varnish wiki.

Compile VCL

To see the C code a VCL file compiles to, run:

 $ varnishd -d -f bits.inc.vcl -C

Useful Stats

Statistics about the shared memory segment can be viewed using varnishstat:

$ varnishstat -1 -f shm*
shm_records             121368         0.67 SHM records
shm_writes              120391         0.67 SHM writes
shm_flushes                  0         0.00 SHM flushes due to overflow
shm_cont                     0         0.00 SHM MTX contention
shm_cycles                   0         0.00 SHM cycles through buffer

If the shm_flushes count is high, the size of the shared memory segment can be increased by upping the value of the shm_workspace run-time parameter, or by specifying it in the varnishd command-line, using -l. See section "Run-Time Parameters" in the varnishd documentation.

Further reading

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox