Job queue

From Wikitech
(Difference between revisions)
Jump to: navigation, search
m
(Emergency kill)
 
(15 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Job queue runners run on srv91-100 (the job-runners node group). Start them with /home/wikipedia/bin/jobs-daemon.
+
__TOC__
  
See also http://meta.wikimedia.org/wiki/Help:Job_queue
+
==Overview==
 +
Job queue runners run on the job-runners node group. Install a new job runner using the application::jobrunner class in puppet.
 +
 
 +
The daemon is controlled via <tt>/etc/init.d/mw-job-runner</tt> and will start on boot by default.
 +
 
 +
See also http://www.mediawiki.org/wiki/Manual:Job_queue
 +
 
 +
==Emergency kill==
 +
If there's an urgent call to kill job queues everywhere for example because they're causing clusterwide swapdeath:<br />
 +
- on the job-runner group (see /home/config/others/usr/local/dsh/node_groups) . . .<br />
 +
- as root . . .<br />
 +
- dsh -g job-runners pkill -9 -f obs (this matches jobs-loop and RunJobs and few others, we don't trap sinals in our php scripts so there's no need to kill gracefully)<br />
 +
- if you end up having to power cycle hosts that are nonresponsive, remember to rerun the dsh afterward to make sure job-runners is stopped there as well until someone can investigate the memory issue

Latest revision as of 15:55, 19 January 2012

Contents


[edit] Overview

Job queue runners run on the job-runners node group. Install a new job runner using the application::jobrunner class in puppet.

The daemon is controlled via /etc/init.d/mw-job-runner and will start on boot by default.

See also http://www.mediawiki.org/wiki/Manual:Job_queue

[edit] Emergency kill

If there's an urgent call to kill job queues everywhere for example because they're causing clusterwide swapdeath:
- on the job-runner group (see /home/config/others/usr/local/dsh/node_groups) . . .
- as root . . .
- dsh -g job-runners pkill -9 -f obs (this matches jobs-loop and RunJobs and few others, we don't trap sinals in our php scripts so there's no need to kill gracefully)
- if you end up having to power cycle hosts that are nonresponsive, remember to rerun the dsh afterward to make sure job-runners is stopped there as well until someone can investigate the memory issue

Personal tools
Namespaces

Variants
Actions
Navigation
Ops documentation
Wiki
Toolbox