Fundraising Analytics/Impression Stats

From Wikitech
Jump to: navigation, search

Banner impressions and landing page stats are collected from squid logs via udp2log, running on Locke.
From there log files are periodically moved to Storage3 via file_mover@locke's crontab. Logs are moved uncompressed because excess CPU utilization on locke interferes with udp log collection.
Once on storage3, log files are compressed on via cronjob running as logmover@storage3 and archived.
Finally, files are parsed from storage3 by Faulkner's analytics scripts.

Contents

udp2log proxy log collection

To enable
ssh to locke and uncomment fundraising-related lines in /etc/udp2log/squid to look like this:

...
## Fundraising
# Landing pages
pipe 1 /a/squid/fundraising/lp-filter >> /a/squid/fundraising/logs/landingpages.log

# Banner Impressions
pipe 1 /a/squid/fundraising/bi-filter >> /a/squid/fundraising/logs/bannerImpressions.log
...

Then HUP udp2log:

awjrichards@locke:~$ /home/file_mover/scripts/resetudp2log 
Sending SIGHUP to udp2log...

To disable
SSH into Locke, and comment fundraising-related lines in /etc/udp2log/squid.

Then HUP udp2log:

awjrichards@locke:~$ /home/file_mover/scripts/resetudp2log 
Sending SIGHUP to udp2log...

proxy log copy to hume

To enable:
Enable this crontab entry for file_mover@locke:

*/15 * * * * /home/file_mover/scripts/rotate_fundraising_logs

To disable:
Comment out this crontab entry for file_mover@locke:

#*/15 * * * * /home/file_mover/scripts/rotate_fundraising_logs

proxy log compression on storage3

To enable:
Enable this crontab entry for logmover@storage3:

*/5 * * * * /home/logmover/scripts/gzip_incoming_logs.pl

To disable:
Comment out this crontab entry for logmover@storage3:

#*/5 * * * * /home/logmover/scripts/gzip_incoming_logs.pl

monitoring and debugging

Both cron scripts log fairly verbosely, and /var/log/syslog will show you what files they touch, actions, and errors. If the scripts run successfully there is no output except in logs, when there is an error they print to stdout to cause cron to mail.

Under normal operation, you should see this sequence:

  1. logs collect in realtime at locke:/a/squid/fundraising/logs/*.log
  2. every 15 min, logs rotate to locke:/a/squid/fundraising/logs/destined_for_storage3 and that dir is rsync'd to storage3:/archive/incoming_udplogs
  3. every 5 min, storage3:/archive/incoming_udplogs is polled and any files are compressed and moved to storage3:/archive/udplogs for long term archiving
  4. once files have been moved from storage3:/archive/incoming_udplog, they are deleted from locke:/a/squid/fundraising/logs/destined_for_storage3 by the log rotation script
  5. if storage3 is inaccessible locke:/a/squid/fundraising/logs/destined_for_storage will continue to collect files until storage3 is able to process them.
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox