vtk_job

vtk_job_control

Usage
vtk_job_control slaveId action jobId opt1 opt2 opt3 -why TEXT_REASON slaveId STOP jobId [Options] -why TEXT_REASON slaveId SUSPEND jobId [Options] -why TEXT_REASON
Description
This is the main procedure to send controlling signals and modifications to jobs running on remote slaves. The argument slaveId can either be a legal VovId, "ALL", or the number 0 which is equivalent to "ALL".
The argument action can be one of STOP KILL DEQUEUE SUSPEND RESUME SIGUSR1 SIGUSR2 SIGTSTP CHECK EXT MODIFY
  • If jobId is 0, then all jobs on the specified slave are affected.
  • If jobId is the string "SLAVE", then the slave is stopped gracefully.
  • If jobId is the string "SLAVE/FORCE", then the slave is stopped with force. Slaves can only be stopped.
The optional argument -why "Optional Reason" can be used to pass a reason to the WHYSTATUS field and property of affected nodes.
Options for action EXT, MODIFY:
Table 1.
Options Action EXT Action MODIFY
opt1 signalName fieldName
opt2 procNameIncludeRx newValue
opt3 procNameExcludeRx unused
Options for action STOP,SUSPEND :
  • -signals comma-separated list of signals
  • -include comma-separated list of process names to include in sending signals
  • -exclude comma-separated list of process names to exclude in sending signals
  • -delay delays in seconds between sending the list of signals
Note:
  • STOP and KILL are equivalent
  • The include and exclude lists used by STOP and SUSPEND are mutually exclusive; if both are specified the exclude list will apply.
  • DEQUEUE does not reach the slave and is processed by vovserver
  • CHECK forces the slaves to scan the process table and gather information about the processes and send it to the vovserver
  • EXT uses the script vovjobctrl to execute the job control. Check the documentation about vovjobctrl for more information about this type of job control.
Examples
# Stop job 23344:
vtk_job_control 0 STOP 23344 -why "Custom script stopped job"
# Stop all jobs on slave 21221:
vtk_job_control 21221 STOP 0
# Modify autokill on running job 24455 to 120s:
vtk_job_control 0 MODIFY 24455 autokill 120 -why "Autokill job from job control"
# Suspend all sleep jobs using SIGINT followed by SIGHUP on slave 12345:
vtk_job_control 12345 SUSPEND -signals INT,HUP -include sleep
# Kill all jobs except sleep jobs using the default signal cascade with 2 seconds between signals on slave 23456:
vtk_job_control 23456 STOP -exclude sleep -delay 2