Job queue

From Wikitech
(Difference between revisions)
Jump to: navigation, search
(Emergency kill)
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Job queue runners run on srv82-120 (the job-runners node group). Start them with
+
__TOC__
  
dsh -N job-runners -f /home/wikipedia/bin/jobs-daemon.
+
==Overview==
 +
Job queue runners run on the job-runners node group. Install a new job runner using the application::jobrunner class in puppet.
  
To install:
+
The daemon is controlled via <tt>/etc/init.d/mw-job-runner</tt> and will start on boot by default.
* Install daemonize, e.g. the one from dag: <tt>/home/wikipedia/rpms/daemonize/daemonize-1.5-1_wm.x86_64.rpm</tt>
+
* Install the change user program [http://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/jobs-loop/run-jobs.c run-jobs.c], can be done simply with <tt>gcc run-jobs.c -o /usr/local/bin/run-jobs</tt>
+
  
 
See also http://www.mediawiki.org/wiki/Manual:Job_queue
 
See also http://www.mediawiki.org/wiki/Manual:Job_queue
 +
 +
==Emergency kill==
 +
If there's an urgent call to kill job queues everywhere for example because they're causing clusterwide swapdeath:<br />
 +
- on the job-runner group (see /home/config/others/usr/local/dsh/node_groups) . . .<br />
 +
- as root . . .<br />
 +
- dsh -g job-runners pkill -9 -f obs (this matches jobs-loop and RunJobs and few others, we don't trap sinals in our php scripts so there's no need to kill gracefully)<br />
 +
- if you end up having to power cycle hosts that are nonresponsive, remember to rerun the dsh afterward to make sure job-runners is stopped there as well until someone can investigate the memory issue

Latest revision as of 15:55, 19 January 2012

Contents


[edit] Overview

Job queue runners run on the job-runners node group. Install a new job runner using the application::jobrunner class in puppet.

The daemon is controlled via /etc/init.d/mw-job-runner and will start on boot by default.

See also http://www.mediawiki.org/wiki/Manual:Job_queue

[edit] Emergency kill

If there's an urgent call to kill job queues everywhere for example because they're causing clusterwide swapdeath:
- on the job-runner group (see /home/config/others/usr/local/dsh/node_groups) . . .
- as root . . .
- dsh -g job-runners pkill -9 -f obs (this matches jobs-loop and RunJobs and few others, we don't trap sinals in our php scripts so there's no need to kill gracefully)
- if you end up having to power cycle hosts that are nonresponsive, remember to rerun the dsh afterward to make sure job-runners is stopped there as well until someone can investigate the memory issue

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox