Ganglia

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(current)
 
(6 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==gmond==
+
== Installing ==
 +
On (almost) all servers this is handled by Puppet. Every node sets the <tt>$cluster</tt> variable which determines in which Ganglia group the server belongs. An
 +
include ganglia
 +
statement then makes sure that ganglia gets installed with the right configuration file.
 +
 
 +
The ''aggregators'' for each cluster should additionally set variable:
 +
 
 +
$ganglia_aggregator = "true"
 +
 
 +
== Configuration background==
 +
 
 +
=== gmond===
  
 
Configuration of gmond is done via the master file <tt>/home/wikipedia/conf/gmond/gmond.conf.master</tt>, and the configuration generator script <tt>/home/wikipedia/conf/gmond/conf.php</tt>. The whole <tt>/home/wikipedia/conf/gmond</tt> directory is copied to every server running gmond, as <tt>/etc/gmond</tt>. The <tt>./sync</tt> script is a shortcut to the relevant rsync command. Each server then has a symlink from <tt>/etc/gmond.conf</tt> to the relevant specialised configuration script in <tt>/etc/gmond/*.conf</tt>.
 
Configuration of gmond is done via the master file <tt>/home/wikipedia/conf/gmond/gmond.conf.master</tt>, and the configuration generator script <tt>/home/wikipedia/conf/gmond/conf.php</tt>. The whole <tt>/home/wikipedia/conf/gmond</tt> directory is copied to every server running gmond, as <tt>/etc/gmond</tt>. The <tt>./sync</tt> script is a shortcut to the relevant rsync command. Each server then has a symlink from <tt>/etc/gmond.conf</tt> to the relevant specialised configuration script in <tt>/etc/gmond/*.conf</tt>.
Line 7: Line 18:
 
The <tt>*_aggr.conf</tt> configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.  
 
The <tt>*_aggr.conf</tt> configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.  
  
<tt>/home/wikipedia/src/packages/install-ganglia</tt> should be considered the canonical installation script. It works on either Fedora or Ubuntu.
+
===gmetad===
 +
 
 +
There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently).  Its rrd files are writen to /var/lib/ganglia/rrds/.  Its config file lives in /etc/ganglia/gmetad.conf.
 +
 
 +
Another instance runs on streber. It appears to aggregate all the rest of the data.  Data and config files live in the same locations as above.
 +
 
 +
(Really?  Where is the data for misc pmtpa? Could someone fill in the missing bits please?)
 +
 
 +
<!--
  
==gmetad==
+
<tt>gmetad_aggr</tt> is responsible for aggregating data from the pmtpa and esams "clouds". Its resource requirements are small.
  
There are two instances of gmetad running on zwinger: <tt>gmetad_aggr</tt> and <tt>gmetad_pmtpa</tt>. The instances run via symlinks at <tt>/usr/sbin/gmetad_aggr</tt> and <tt>/usr/sbin/gmetad_pmtpa</tt>, so the specialised name will appear in ps. There are two init.d services which start these instances. They do not use PID files, they just signal the processes by name.  
+
<tt>gmetad_pmtpa</tt> aggregates data from all of the servers in the pmtpa cloud. Resource requirements are much larger. Its RRD files are stored on a tmpfs mounted at /mnt/ganglia_tmp -- the disk write rate was too high for it to be stored directly to disk. Once per hour, the in-memory RRDs are synced to disk, via <tt>/usr/local/bin/save-gmetad-rrds</tt>. On startup, <tt>/usr/local/bin/restore-gmetad-rrds</tt> is executed, which loads the RRDs from disk into the tmpfs.  
  
<tt>gmetad_aggr</tt> is responsible for aggregating data from the pmtpa, knams and (eventually) yaseo "grids". Its resource requirements are small.  
+
The gmetad configuration files are in <tt>/etc/gmetad_aggr.conf</tt> and <tt>/etc/gmetad_pmtpa.conf</tt>, with a copy in <tt>/home/wikipedia/conf/gmond</tt>.
  
<tt>gmetad_pmtpa</tt> aggregates data from all of the servers in the pmtpa grid. Resource requirements are much larger. Its RRD files are stored on a tmpfs mounted at /mnt/ganglia_tmp -- the disk write rate was too high for it to be stored directly to disk. Once per hour, the in-memory RRDs are synced to disk, via <tt>/usr/local/bin/save-gmetad-rrds</tt>. On startup, <tt>/usr/local/bin/restore-gmetad-rrds</tt> is executed, which loads the RRDs from disk into the tmpfs.
+
-->
  
The gmetad configuration files are in <tt>/etc/gmetad_aggr.conf</tt> and <tt>/etc/gmetad_pmtpa.conf</tt>, with a copy in <tt>/home/wikipedia/conf/gmond</tt>.
+
=== Puppet ===
 +
The puppet recipes for ganglia can be found under <tt>manifests/ganglia.pp</tt>
  
 
==Web frontend==
 
==Web frontend==
Line 25: Line 45:
 
==gmetricd==
 
==gmetricd==
  
A python daemon called gmetricd collects the <tt>diskio_*</tt> metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the <tt>ganglia_metrics</tt> directory, and there are some RPMs in <tt>/home/wikipedia/rpms/ganglia/ganglia_metrics/</tt>.
+
A python daemon called gmetricd collects the <tt>diskio_*</tt> metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the <tt>ganglia_metrics</tt> directory, and there are some RPMs in <tt>/home/wikipedia/rpms/ganglia/ganglia_metrics/</tt>. It's also available in the APT repository: package <tt>ganglia-metrics</tt>.
  
 
[[Category:Current]]
 
[[Category:Current]]
 +
[[Category:Bot and monitoring]]
 +
[[Category:Services]]

Latest revision as of 03:17, 29 June 2012

Contents

[edit] Installing

On (almost) all servers this is handled by Puppet. Every node sets the $cluster variable which determines in which Ganglia group the server belongs. An

include ganglia

statement then makes sure that ganglia gets installed with the right configuration file.

The aggregators for each cluster should additionally set variable:

$ganglia_aggregator = "true"

[edit] Configuration background

[edit] gmond

Configuration of gmond is done via the master file /home/wikipedia/conf/gmond/gmond.conf.master, and the configuration generator script /home/wikipedia/conf/gmond/conf.php. The whole /home/wikipedia/conf/gmond directory is copied to every server running gmond, as /etc/gmond. The ./sync script is a shortcut to the relevant rsync command. Each server then has a symlink from /etc/gmond.conf to the relevant specialised configuration script in /etc/gmond/*.conf.

Each cluster has its own multicast channel. Channels are allocated automatically by conf.php. The first cluster in the $clusters array is given the IP address 239.192.0.1, the second is given 239.192.0.2, and so on up. 239.192.0.0/24 should be considered reserved for this purpose, as documented at IP addresses.

The *_aggr.conf configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.

[edit] gmetad

There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently). Its rrd files are writen to /var/lib/ganglia/rrds/. Its config file lives in /etc/ganglia/gmetad.conf.

Another instance runs on streber. It appears to aggregate all the rest of the data. Data and config files live in the same locations as above.

(Really? Where is the data for misc pmtpa? Could someone fill in the missing bits please?)


[edit] Puppet

The puppet recipes for ganglia can be found under manifests/ganglia.pp

[edit] Web frontend

We run a customised copy of the web frontend with a document root at /home/wikipedia/htdocs/ganglia. There is a symlink from pmtpa to ., conf.php detects the request URI and reads from either gmetad_aggr or gmetad_pmtpa appropriately.

[edit] gmetricd

A python daemon called gmetricd collects the diskio_* metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the ganglia_metrics directory, and there are some RPMs in /home/wikipedia/rpms/ganglia/ganglia_metrics/. It's also available in the APT repository: package ganglia-metrics.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox