UDP based profiling
From Wikitech
(Difference between revisions)
(+cat) |
|||
| (8 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| − | + | == What and Where == | |
| − | + | * $wgProfiler = new ProfilingSimpleUDP; (in index or Settings) | |
| − | + | * $wgUDPProfilerHost = '10.0.6.30'; (in Settings) | |
| − | + | * running on professor: | |
| − | + | ** /usr/udpprofile/sbin/collector [http://svn.wikimedia.org/viewvc/mediawiki/trunk/udpprofile/ svn root] | |
| − | + | *** listens on udp:3811 for profiling packets, provides xml dumps on tcp:3811 | |
| + | ** /usr/udpprofile/sbin/profiler-to-carbon | ||
| + | *** polls collector every minute, inserts count and timing (in ms) data into whisper db's | ||
| + | ** /opt/graphite/bin/carbon-cache.py | ||
| + | *** updates whisper db files for graphite | ||
| + | * [http://graphite.wikimedia.org/dashboard graphite based web interface - uses labs ldap for auth] | ||
| + | * [http://noc.wikimedia.org/cgi-bin/report.py aggregate report web interface] | ||
| + | *** [http://svn.wikimedia.org/viewvc/mediawiki/trunk/udpprofile/web/ svn root] | ||
| − | [[Category:Services]] | + | == Using The Graphite Dashboard == |
| + | === Finding Metrics === | ||
| + | * The left sidebar of the graphite dashboard provides two drop down menus - "Metric Type", which is used for providing shortcuts or aliases to certain metrics (which are hardcoded in the dashboard.conf located in puppet/files/graphite) and Category. Then below, a hierarchical finder of everything under the chosen category. This is all straight forward, except what's shown when Category = * is limited to a single level of the hierarchy - you don't want this! If Metric Type == Everything, make sure to select a class in Category. | ||
| + | |||
| + | [[File:Graphitemenu.png]] | ||
| + | |||
| + | * The dashboard menu option allows sets of graphs to be saved as a named dashboard. Share provides a direct url to a saved dashboard, and Finder lists all shared dashboards. | ||
| + | |||
| + | === Combining Metrics === | ||
| + | * Just drag graphs on top of each other to combine. | ||
| + | |||
| + | === Types of Metrics === | ||
| + | * count - the number of calls made in the last minute. Note that for a few types of requests, mediawiki profiles 100% of requests, but most are at about 1.5%. | ||
| + | * tavg - average time in ms, based on everything collected in the sampling time - total-time/count | ||
| + | * tp50 - 50th percentile in ms, calculated from a bucket of 300 samples | ||
| + | * tp90 - 90th percentile | ||
| + | * tp99 - 99th percentile | ||
| + | |||
| + | === Examples === | ||
| + | * When the Job Queue depth alerts, but jobs appear to be running, its been difficult to know if the insertion rate has spiked or if execution has slowed. Looking at the pop rate over the insert rate provides an easy to parse picture of health. | ||
| + | [[File:Job-queue.png]] | ||
| + | * url to generate - http://graphite.wikimedia.org/render?width=800&from=-4hours&until=now&height=600&target=*.job-pop.count&target=*.job-insert.count | ||
| + | * 99% ParserCache get times, with cluster deploys overlaid as vertical lines | ||
| + | [[File:Pcache tp99 deploys.png]] | ||
| + | * The url to generate this was - http://graphite.wikimedia.org/render?from=-24hours&until=now&width=800&height=600&target=ParserCache.get.tp99&target=drawAsInfinite(deploy.any) | ||
| + | ** Overlaying metrics is as simple as appending multiple target options. "&target=drawAsInfinite(deploy.any)" can be added to any graph for the deploy lines. | ||
| + | |||
| + | * The 8 Slowest Parser functions, based on time averages. (99th percentiles here are too scary.) This shows how to group by and limit across lots of stats. | ||
| + | |||
| + | [[File:Slow Parser Funcs Avg.png]] | ||
| + | |||
| + | * url to generate: http://graphite.wikimedia.org/render?width=800&from=-8hours&until=now&height=600&vtitle=time_ms&target=highestMax(Parser.*.*.tavg%2C8)&title=8_Slowest_Parser_Functions_Avg | ||
| + | ** times are in ms, so 10k = 10 seconds. | ||
| + | ** This takes the tavg for everything profiled under the Parser class and selects the 8 with the highest value in the last 8 hours. | ||
| + | |||
| + | * 8 slowest db write queries, by 99% time | ||
| + | |||
| + | [[File:Top-master-writes-by-time.png]] | ||
| + | * http://graphite.wikimedia.org/render?width=800&from=-2hours&until=now&height=600&drawNullAsZero=true&target=highestMax(group(query-m.UPDATE.*.tp99%2Cquery-m.REPLACE.*.tp99%2Cquery-m.INSERT.*.tp99%2Cquery-m.DELETE.*.tp99)%2C10)&title=Top_Master_Writes_By_Time | ||
| + | |||
| + | [[Category:Services| {{PAGENAME}}]] | ||
Latest revision as of 03:27, 29 June 2012
Contents |
[edit] What and Where
- $wgProfiler = new ProfilingSimpleUDP; (in index or Settings)
- $wgUDPProfilerHost = '10.0.6.30'; (in Settings)
- running on professor:
- /usr/udpprofile/sbin/collector svn root
- listens on udp:3811 for profiling packets, provides xml dumps on tcp:3811
- /usr/udpprofile/sbin/profiler-to-carbon
- polls collector every minute, inserts count and timing (in ms) data into whisper db's
- /opt/graphite/bin/carbon-cache.py
- updates whisper db files for graphite
- /usr/udpprofile/sbin/collector svn root
- graphite based web interface - uses labs ldap for auth
- aggregate report web interface
[edit] Using The Graphite Dashboard
[edit] Finding Metrics
- The left sidebar of the graphite dashboard provides two drop down menus - "Metric Type", which is used for providing shortcuts or aliases to certain metrics (which are hardcoded in the dashboard.conf located in puppet/files/graphite) and Category. Then below, a hierarchical finder of everything under the chosen category. This is all straight forward, except what's shown when Category = * is limited to a single level of the hierarchy - you don't want this! If Metric Type == Everything, make sure to select a class in Category.
- The dashboard menu option allows sets of graphs to be saved as a named dashboard. Share provides a direct url to a saved dashboard, and Finder lists all shared dashboards.
[edit] Combining Metrics
- Just drag graphs on top of each other to combine.
[edit] Types of Metrics
- count - the number of calls made in the last minute. Note that for a few types of requests, mediawiki profiles 100% of requests, but most are at about 1.5%.
- tavg - average time in ms, based on everything collected in the sampling time - total-time/count
- tp50 - 50th percentile in ms, calculated from a bucket of 300 samples
- tp90 - 90th percentile
- tp99 - 99th percentile
[edit] Examples
- When the Job Queue depth alerts, but jobs appear to be running, its been difficult to know if the insertion rate has spiked or if execution has slowed. Looking at the pop rate over the insert rate provides an easy to parse picture of health.
- url to generate - http://graphite.wikimedia.org/render?width=800&from=-4hours&until=now&height=600&target=*.job-pop.count&target=*.job-insert.count
- 99% ParserCache get times, with cluster deploys overlaid as vertical lines
- The url to generate this was - http://graphite.wikimedia.org/render?from=-24hours&until=now&width=800&height=600&target=ParserCache.get.tp99&target=drawAsInfinite(deploy.any)
- Overlaying metrics is as simple as appending multiple target options. "&target=drawAsInfinite(deploy.any)" can be added to any graph for the deploy lines.
- The 8 Slowest Parser functions, based on time averages. (99th percentiles here are too scary.) This shows how to group by and limit across lots of stats.
- url to generate: http://graphite.wikimedia.org/render?width=800&from=-8hours&until=now&height=600&vtitle=time_ms&target=highestMax(Parser.*.*.tavg%2C8)&title=8_Slowest_Parser_Functions_Avg
- times are in ms, so 10k = 10 seconds.
- This takes the tavg for everything profiled under the Parser class and selects the 8 with the highest value in the last 8 hours.
- 8 slowest db write queries, by 99% time




