Fundraising Monitoring
From Wikitech
This page is intended to document the monitoring infrastructure that exists for the fundraising as well as keep track of desired monitoring functionality.
Contents |
Existing monitoring infrastructure
Log monitoring
Minfraud log
We currently monitor the 'minfraud' log (which gets aggregated form the payments cluster to Loudon).
- Ganglia (http://ganglia.wikimedia.org/?r=hour&c=Miscellaneous&h=loudon.wikimedia.org)
- Average risk score (as reported by MinFraud) (Ganglia)
- Number of successful transactions (Ganglia)
- Number of failed transactions (Ganglia)
Monitoring wishlist
See RT tickets #405
- Hudson
- Nagios check for alive-ness
- Nagios check for failed builds
- Note: some scripts run by Hudson need to be modified to throw a non-successful exit status when they don't complete properly (eg send/receive mail scripts for civimail)
- Ganglia graphs for build success/failiure
- Nagios check for too many files in build folders (if the limit of 63999 gets hit, builds will fail)
- ActiveMQ
- Nagios check for queues filling up too fast
- Ganglia graphs for message volume in various queues
- Service communication times
- Ganglia graphs for communication timings with PayPal and MaxMind
- Nagios checks for timeouts/unacceptably high communication times
- 3rd party service accessibility from payments cluster
- Nagios check for communications access to MaxMind/PayPal