Ganglia

From Wikitech
Revision as of 15:38, 6 November 2009 by Mark (Talk | contribs)

Jump to: navigation, search

Contents

Installing

On (almost) all servers this is handled by Puppet. Every node sets the $cluster variable which determines in which Ganglia group the server belongs. An

include ganglia

statement then makes sure that ganglia gets installed with the right configuration file.

The aggregators for each cluster should additionally set variable:

$ganglia_aggregator = "true"

Configuration background

gmond

Configuration of gmond is done via the master file /home/wikipedia/conf/gmond/gmond.conf.master, and the configuration generator script /home/wikipedia/conf/gmond/conf.php. The whole /home/wikipedia/conf/gmond directory is copied to every server running gmond, as /etc/gmond. The ./sync script is a shortcut to the relevant rsync command. Each server then has a symlink from /etc/gmond.conf to the relevant specialised configuration script in /etc/gmond/*.conf.

Each cluster has its own multicast channel. Channels are allocated automatically by conf.php. The first cluster in the $clusters array is given the IP address 239.192.0.1, the second is given 239.192.0.2, and so on up. 239.192.0.0/24 should be considered reserved for this purpose, as documented at IP addresses.

The *_aggr.conf configuration files are "aggregator" configuration files. These configure ganglia in non-deaf mode, allowing it to listen to the multicast channel, aggregate the state of the cluster in memory and respond to XML requests from gmetad. The remaining configuration files use deaf mode, which saves a small amount of CPU time and memory for those servers.

gmetad

There are two instances of gmetad running on zwinger: gmetad_aggr and gmetad_pmtpa. The instances run via symlinks at /usr/sbin/gmetad_aggr and /usr/sbin/gmetad_pmtpa, so the specialised name will appear in ps. There are two init.d services which start these instances. They do not use PID files, they just signal the processes by name.

gmetad_aggr is responsible for aggregating data from the pmtpa and esams "clouds". Its resource requirements are small.

gmetad_pmtpa aggregates data from all of the servers in the pmtpa cloud. Resource requirements are much larger. Its RRD files are stored on a tmpfs mounted at /mnt/ganglia_tmp -- the disk write rate was too high for it to be stored directly to disk. Once per hour, the in-memory RRDs are synced to disk, via /usr/local/bin/save-gmetad-rrds. On startup, /usr/local/bin/restore-gmetad-rrds is executed, which loads the RRDs from disk into the tmpfs.

The gmetad configuration files are in /etc/gmetad_aggr.conf and /etc/gmetad_pmtpa.conf, with a copy in /home/wikipedia/conf/gmond.

Puppet

The puppet recipes for ganglia can be found under manifests/ganglia.pp

Web frontend

We run a customised copy of the web frontend with a document root at /home/wikipedia/htdocs/ganglia. There is a symlink from pmtpa to ., conf.php detects the request URI and reads from either gmetad_aggr or gmetad_pmtpa appropriately.

gmetricd

A python daemon called gmetricd collects the diskio_* metrics. It should be running on every server that runs gmond. It should be possible to extend it with other metrics if desired. The code is in SVN in the ganglia_metrics directory, and there are some RPMs in /home/wikipedia/rpms/ganglia/ganglia_metrics/. It's also available in the APT repository: package ganglia-metrics.

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox