Add a server

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(update)
 
Line 48: Line 48:
 
* Running <tt>apt-get -y</tt> unattended without the <tt>--no-remove</tt> option: due to some subtle error in the repository or sources.list, apt-get declares your entire installation "conflicting", right down to glibc, and removes it, bricking the server.
 
* Running <tt>apt-get -y</tt> unattended without the <tt>--no-remove</tt> option: due to some subtle error in the repository or sources.list, apt-get declares your entire installation "conflicting", right down to glibc, and removes it, bricking the server.
 
* Apache not in nagios: Tim gets upset.
 
* Apache not in nagios: Tim gets upset.
 +
[[Category: How-To]]

Latest revision as of 22:23, 27 September 2011

For any MediaWiki installation:

  • Install Ubuntu
  • Install wikimedia-task-appserver
  • Add the hostname to /usr/local/dsh/node_groups/mediawiki-installation on the ssh bastion host (fenari)

Additionally, for a non-apache batch server:

  • Stop apache: /etc/init.d/apache2 stop
  • Disable apache by running update-rc.d -f apache2 remove
  • Add to ganglia with:
apt-get -y --no-remove install gmond ganglia-metrics
cp -r /home/wikipedia/conf/gmond /etc
rm /etc/gmond.conf
ln -s gmond/misc.conf /etc/gmond.conf
/etc/init.d/gmond restart
/etc/init.d/gmetricd restart

Additionally, for a main pool apache server:

  • Add the hostname to /usr/local/dsh/node_groups/apaches
  • Add to nagios with cd /home/wikipedia/conf/nagios && ./sync
  • Add to ganglia with:
apt-get -y --no-remove install gmond ganglia-metrics
cp -r /home/wikipedia/conf/gmond /etc
rm /etc/gmond.conf
/home/wikipedia/conf/gmond/make-apache-symlink
/etc/init.d/gmond restart
/etc/init.d/gmetricd restart
  • Add the server to /etc/pybal/apache on the LVS director for the apaches, currently lvs3. The weight should be proportional to the CPU count.

[edit] List of things that will break if you try to install MediaWiki without following this procedure

  • Not in mediawiki-installation node group: server doesn't get sync-file/scap updates, so misses out on DB server and similar changes. Thus it goes rogue and destroys the cluster.
  • No wikimedia-nis-client: non-roots can't sync. Server misses out on updates, goes rogue.
  • No sudoers (from wikimedia-task-appserver): non-roots can't sync.
  • No upload NFS mounts (from wikimedia-task-appserver): MediaWiki will spew errors when it tries to access uploads, corrupting your data and potentially corrupting the shared caches.
  • No latex, djvulibre, ploticus, etc. (from wikimedia-task-appserver): Bad output, corrupted caches.
  • Apache not in apaches node group: apache-restart-all etc. fails, unmonitored in nagios.
  • Server not in ganglia: by the time you realise you need it, you will have missed hours or days of important performance data.
  • Running apt-get -y unattended without the --no-remove option: due to some subtle error in the repository or sources.list, apt-get declares your entire installation "conflicting", right down to glibc, and removes it, bricking the server.
  • Apache not in nagios: Tim gets upset.
Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox