Ganglia
| (5 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
== Installing == | == Installing == | ||
| + | On (almost) all servers this is handled by Puppet. Every node sets the <tt>$cluster</tt> variable which determines in which Ganglia group the server belongs. An | ||
| + | include ganglia | ||
| + | statement then makes sure that ganglia gets installed with the right configuration file. | ||
| − | + | The ''aggregators'' for each cluster should additionally set variable: | |
| − | ==gmond== | + | $ganglia_aggregator = "true" |
| + | |||
| + | == Configuration background== | ||
| + | |||
| + | === gmond=== | ||
Configuration of gmond is done via the master file <tt>/home/wikipedia/conf/gmond/gmond.conf.master</tt>, and the configuration generator script <tt>/home/wikipedia/conf/gmond/conf.php</tt>. The whole <tt>/home/wikipedia/conf/gmond</tt> directory is copied to every server running gmond, as <tt>/etc/gmond</tt>. The <tt>./sync</tt> script is a shortcut to the relevant rsync command. Each server then has a symlink from <tt>/etc/gmond.conf</tt> to the relevant specialised configuration script in <tt>/etc/gmond/*.conf</tt>. | Configuration of gmond is done via the master file <tt>/home/wikipedia/conf/gmond/gmond.conf.master</tt>, and the configuration generator script <tt>/home/wikipedia/conf/gmond/conf.php</tt>. The whole <tt>/home/wikipedia/conf/gmond</tt> directory is copied to every server running gmond, as <tt>/etc/gmond</tt>. The <tt>./sync</tt> script is a shortcut to the relevant rsync command. Each server then has a symlink from <tt>/etc/gmond.conf</tt> to the relevant specialised configuration script in <tt>/etc/gmond/*.conf</tt>. | ||
| Line 11: | Line 18: | ||
The <tt>*_aggr.conf</tt> configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers. | The <tt>*_aggr.conf</tt> configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers. | ||
| − | + | ===gmetad=== | |
| + | |||
| + | There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently). Its rrd files are writen to /var/lib/ganglia/rrds/. Its config file lives in /etc/ganglia/gmetad.conf. | ||
| + | |||
| + | Another instance runs on streber. It appears to aggregate all the rest of the data. Data and config files live in the same locations as above. | ||
| + | |||
| + | (Really? Where is the data for misc pmtpa? Could someone fill in the missing bits please?) | ||
| + | |||
| + | <!-- | ||
| − | + | <tt>gmetad_aggr</tt> is responsible for aggregating data from the pmtpa and esams "clouds". Its resource requirements are small. | |
| − | + | <tt>gmetad_pmtpa</tt> aggregates data from all of the servers in the pmtpa cloud. Resource requirements are much larger. Its RRD files are stored on a tmpfs mounted at /mnt/ganglia_tmp -- the disk write rate was too high for it to be stored directly to disk. Once per hour, the in-memory RRDs are synced to disk, via <tt>/usr/local/bin/save-gmetad-rrds</tt>. On startup, <tt>/usr/local/bin/restore-gmetad-rrds</tt> is executed, which loads the RRDs from disk into the tmpfs. | |
| − | <tt>gmetad_aggr</tt> | + | The gmetad configuration files are in <tt>/etc/gmetad_aggr.conf</tt> and <tt>/etc/gmetad_pmtpa.conf</tt>, with a copy in <tt>/home/wikipedia/conf/gmond</tt>. |
| − | + | --> | |
| − | The | + | === Puppet === |
| + | The puppet recipes for ganglia can be found under <tt>manifests/ganglia.pp</tt> | ||
==Web frontend== | ==Web frontend== | ||
| Line 29: | Line 45: | ||
==gmetricd== | ==gmetricd== | ||
| − | A python daemon called gmetricd collects the <tt>diskio_*</tt> metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the <tt>ganglia_metrics</tt> directory, and there are some RPMs in <tt>/home/wikipedia/rpms/ganglia/ganglia_metrics/</tt>. | + | A python daemon called gmetricd collects the <tt>diskio_*</tt> metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the <tt>ganglia_metrics</tt> directory, and there are some RPMs in <tt>/home/wikipedia/rpms/ganglia/ganglia_metrics/</tt>. It's also available in the APT repository: package <tt>ganglia-metrics</tt>. |
[[Category:Current]] | [[Category:Current]] | ||
| + | [[Category:Bot and monitoring]] | ||
| + | [[Category:Services]] | ||
Latest revision as of 03:17, 29 June 2012
Contents |
[edit] Installing
On (almost) all servers this is handled by Puppet. Every node sets the $cluster variable which determines in which Ganglia group the server belongs. An
include ganglia
statement then makes sure that ganglia gets installed with the right configuration file.
The aggregators for each cluster should additionally set variable:
$ganglia_aggregator = "true"
[edit] Configuration background
[edit] gmond
Configuration of gmond is done via the master file /home/wikipedia/conf/gmond/gmond.conf.master, and the configuration generator script /home/wikipedia/conf/gmond/conf.php. The whole /home/wikipedia/conf/gmond directory is copied to every server running gmond, as /etc/gmond. The ./sync script is a shortcut to the relevant rsync command. Each server then has a symlink from /etc/gmond.conf to the relevant specialised configuration script in /etc/gmond/*.conf.
Each cluster has its own multicast channel. Channels are allocated automatically by conf.php. The first cluster in the $clusters array is given the IP address 239.192.0.1, the second is given 239.192.0.2, and so on up. 239.192.0.0/24 should be considered reserved for this purpose, as documented at IP addresses.
The *_aggr.conf configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.
[edit] gmetad
There is one instance of gmetad running on hooft. It aggregates data from the misc hosts in the esams cloud (apparently). Its rrd files are writen to /var/lib/ganglia/rrds/. Its config file lives in /etc/ganglia/gmetad.conf.
Another instance runs on streber. It appears to aggregate all the rest of the data. Data and config files live in the same locations as above.
(Really? Where is the data for misc pmtpa? Could someone fill in the missing bits please?)
[edit] Puppet
The puppet recipes for ganglia can be found under manifests/ganglia.pp
[edit] Web frontend
We run a customised copy of the web frontend with a document root at /home/wikipedia/htdocs/ganglia. There is a symlink from pmtpa to ., conf.php detects the request URI and reads from either gmetad_aggr or gmetad_pmtpa appropriately.
[edit] gmetricd
A python daemon called gmetricd collects the diskio_* metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the ganglia_metrics directory, and there are some RPMs in /home/wikipedia/rpms/ganglia/ganglia_metrics/. It's also available in the APT repository: package ganglia-metrics.