Ganglia

From Wikitech
(Difference between revisions)
Jump to: navigation, search
 
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{document me}}
+
== Installing ==
{{fixme|Which machines are aggregators?}}
+
On (almost) all servers this is handled by Puppet. Every node sets the <tt>$cluster</tt> variable which determines in which Ganglia group the server belongs. An
{{fixme|How to add a group correctly?}}
+
include ganglia
{{fixme|Why doesn't search group show up?}}
+
statement then makes sure that ganglia gets installed with the right configuration file.
  
==unicast vs multicast==
+
The ''aggregators'' for each cluster should additionally set variable:
*As of version 3.0.0, released 2005-02-07, status messages can be sent over unicast.
+
  
==components==
+
$ganglia_aggregator = "true"
; gmetad
+
: This daemon collect data from the other hosts and write them to RRD databases.
+
: It is installed on [[zwinger]] using <tt>ganglia-monitor-core-gmetad-2.5.6-1.i386.rpm</tt>
+
  
; gmond
+
== Configuration background==
: It is a daemon installed on each machine that locally collect usefull informations. It is queried from time to time by the gmetad hosts.
+
: gmond is nstalled on each machine using <tt>ganglia-monitor-core-gmond-2.5.6-1.i386.rpm</tt>
+
: <s><tt>/etc/gmond.conf</tt> on each machine '''MUST''' be a symlink to <tt>/home/wikipedia/gmond.conf</tt>.  ''If the default gmond.conf is used, ganglia stats for the entire cluster will not be recorded in the right place and will be effectively lost.'' (But see ''merging RRDs'' below.)</s>
+
  
==version mismatches==
+
=== gmond===
*things go wrong when gmond and gmetad have different versions.  or is it when peer gmonds have different versions? something like that anyway
+
  
==zwinger==
+
Configuration of gmond is done via the master file <tt>/home/wikipedia/conf/gmond/gmond.conf.master</tt>, and the configuration generator script <tt>/home/wikipedia/conf/gmond/conf.php</tt>. The whole <tt>/home/wikipedia/conf/gmond</tt> directory is copied to every server running gmond, as <tt>/etc/gmond</tt>. The <tt>./sync</tt> script is a shortcut to the relevant rsync command. Each server then has a symlink from <tt>/etc/gmond.conf</tt> to the relevant specialised configuration script in <tt>/etc/gmond/*.conf</tt>.
RRDs (ganglia statistics, in this case) are in /home/wikipedia/rrds
+
there is some old data in the default location, /var/lib/ganglia/rrds
+
  
==cluster-wide ganglia restart==
+
Each cluster has its own multicast channel. Channels are allocated automatically by <tt>conf.php</tt>. The first cluster in the $clusters array is given the IP address 239.192.0.1, the second is given 239.192.0.2, and so on up. 239.192.0.0/24 should be considered reserved for this purpose, as documented at [[IP addresses]].
If something is amiss with the state of ganglia, and reconfiguring and restarting gmetad isn't enough, do this on zwinger as root:
+
<pre>
+
#!/bin/bash
+
/etc/init.d/gmetad stop
+
dsh -f -a /etc/init.d/gmond stop
+
sleep 5
+
dsh -f -a /etc/init.d/gmond start
+
/etc/init.d/gmetad start
+
</pre>
+
  
or you can just run <tt>/home/wikipedia/bin/ganglia-restart-all</tt>.
+
The <tt>*_aggr.conf</tt> configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.  
  
==merging RRDs==
+
===gmetad===
It is possible, in principle, to merge RRDs using a perl script which can be found on the net somewhere.  Looks like a royal pain though.
+
  
The script location is available through [http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/rrdworld/index.en.htmlRRDWorld on RRDTool official website]. You might want to look at [http://www.pintori.it:8080/rrd-merger/README the README file].
+
There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently). Its rrd files are writen to /var/lib/ganglia/rrds/. Its config file lives in /etc/ganglia/gmetad.conf.
  
{{PD}}
+
Another instance runs on streber.  It appears to aggregate all the rest of the data.  Data and config files live in the same locations as above.
  
 +
(Really?  Where is the data for misc pmtpa? Could someone fill in the missing bits please?)
 +
 +
<!--
 +
 +
<tt>gmetad_aggr</tt> is responsible for aggregating data from the pmtpa and esams "clouds". Its resource requirements are small.
 +
 +
<tt>gmetad_pmtpa</tt> aggregates data from all of the servers in the pmtpa cloud. Resource requirements are much larger. Its RRD files are stored on a tmpfs mounted at /mnt/ganglia_tmp -- the disk write rate was too high for it to be stored directly to disk. Once per hour, the in-memory RRDs are synced to disk, via <tt>/usr/local/bin/save-gmetad-rrds</tt>. On startup, <tt>/usr/local/bin/restore-gmetad-rrds</tt> is executed, which loads the RRDs from disk into the tmpfs.
 +
 +
The gmetad configuration files are in <tt>/etc/gmetad_aggr.conf</tt> and <tt>/etc/gmetad_pmtpa.conf</tt>, with a copy in <tt>/home/wikipedia/conf/gmond</tt>.
 +
 +
-->
 +
 +
=== Puppet ===
 +
The puppet recipes for ganglia can be found under <tt>manifests/ganglia.pp</tt>
 +
 +
==Web frontend==
 +
 +
We run a customised copy of the web frontend with a document root at <tt>/home/wikipedia/htdocs/ganglia</tt>. There is a symlink from <tt>pmtpa</tt> to <tt>.</tt>, <tt>conf.php</tt> detects the request URI and reads from either <tt>gmetad_aggr</tt> or <tt>gmetad_pmtpa</tt> appropriately.
 +
 +
==gmetricd==
 +
 +
A python daemon called gmetricd collects the <tt>diskio_*</tt> metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the <tt>ganglia_metrics</tt> directory, and there are some RPMs in <tt>/home/wikipedia/rpms/ganglia/ganglia_metrics/</tt>. It's also available in the APT repository: package <tt>ganglia-metrics</tt>.
 +
 +
[[Category:Current]]
 
[[Category:Bot and monitoring]]
 
[[Category:Bot and monitoring]]
 +
[[Category:Services]]

Latest revision as of 03:17, 29 June 2012

Contents

[edit] Installing

On (almost) all servers this is handled by Puppet. Every node sets the $cluster variable which determines in which Ganglia group the server belongs. An

include ganglia

statement then makes sure that ganglia gets installed with the right configuration file.

The aggregators for each cluster should additionally set variable:

$ganglia_aggregator = "true"

[edit] Configuration background

[edit] gmond

Configuration of gmond is done via the master file /home/wikipedia/conf/gmond/gmond.conf.master, and the configuration generator script /home/wikipedia/conf/gmond/conf.php. The whole /home/wikipedia/conf/gmond directory is copied to every server running gmond, as /etc/gmond. The ./sync script is a shortcut to the relevant rsync command. Each server then has a symlink from /etc/gmond.conf to the relevant specialised configuration script in /etc/gmond/*.conf.

Each cluster has its own multicast channel. Channels are allocated automatically by conf.php. The first cluster in the $clusters array is given the IP address 239.192.0.1, the second is given 239.192.0.2, and so on up. 239.192.0.0/24 should be considered reserved for this purpose, as documented at IP addresses.

The *_aggr.conf configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.

[edit] gmetad

There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently). Its rrd files are writen to /var/lib/ganglia/rrds/. Its config file lives in /etc/ganglia/gmetad.conf.

Another instance runs on streber. It appears to aggregate all the rest of the data. Data and config files live in the same locations as above.

(Really? Where is the data for misc pmtpa? Could someone fill in the missing bits please?)


[edit] Puppet

The puppet recipes for ganglia can be found under manifests/ganglia.pp

[edit] Web frontend

We run a customised copy of the web frontend with a document root at /home/wikipedia/htdocs/ganglia. There is a symlink from pmtpa to ., conf.php detects the request URI and reads from either gmetad_aggr or gmetad_pmtpa appropriately.

[edit] gmetricd

A python daemon called gmetricd collects the diskio_* metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the ganglia_metrics directory, and there are some RPMs in /home/wikipedia/rpms/ganglia/ganglia_metrics/. It's also available in the APT repository: package ganglia-metrics.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox