vovtaskermgr

The main way to start, configure, and stop the taskers is with the vovtaskermgr command. This command acts relative to the VOV-project enabled in the shell where it is issued.

The file taskers.tcl in the project.swd directory stores the configuration information used by this command.
Note: Changes made to taskers.tcl are not automatically propagated to the running vovtaskers. To do this, use the update subcommand.

A vovtasker listed in the taskers.tcl file may be running or stopped. The show subcommand gives information on the running vovtaskers currently connected to the vovserver. The list subcommand gives the names of all the vovtaskers defined in vovtaskers, whether running or stopped.


vovtaskermgr: Usage Message
  
  USAGE:
      vovtaskermgr <SUBCOMMAND> [options] [taskerList]
  
      Subcommand is case-insensitive.
  
      The taskerList consists of tasker names or tasker id's.
  
  SUBCOMMAND is one of:
      LIST           -- To list all hosts named in the taskers.tcl file.
      RESTART        -- Same as STOP followed by START.
      REFRESH        -- Refresh cached environments and equivalences.
                        The default behavior is for taskers to obtain the
                        equivalences from the server. If changes are made to the
                        equiv.tcl file, the server will need to be instructed to
                        reread the file using the "vovproject reread" command
                        prior to requesting a tasker refresh.
                        If VOVEQUIV_CACHE_FILE is set to "legacy", a host-based
                        equivalence cache file will be created and updated in
                        the SWD/equiv.caches directory. If VOVEQUIV_CACHE_FILE
                        is set to a file path, the specified file will be used
                        instead.
      SHOW           -- Show info about connected or down taskers.
      PRINTSTATUS    -- Tell taskers to print their status in their log file.
  
      START          -- Start configured taskers.  If a list of hosts is
                        given, start taskers only on those hosts.  Otherwise,
                        start all configured taskers that are not running.
      UPDATE         -- Update configuration of running taskers.
  
      RESERVE        -- To reserve specified taskers.
      RESERVESHOW    -- Show current tasker reservations.
      CONFIGURE      -- To reconfigure the specified taskers on-the-fly.
                        Changes only persist until the tasker is stopped.
      STOP           -- Stop taskers; let jobs finish, unless -force is given.
  
      CANCELSHUTDOWN -- Revert stopped but still running taskers to normal
                        so they continue running and accept new jobs.
      ROTATELOG      -- To recreate new log files for specified taskers
                        if log files are missing, create tasker log directories
                        if needed, and have no impact on tasker startup logs.
      CLOSE [MSG]    -- Close taskers from accepting jobs. Closed taskers will
                        start and run, but will do so in a suspended state,
                        displaying the closure message, until opened by the
                        administrator. The default closure message is
                        'Closed by administrator'.
      OPEN [MSG]     -- Open taskers to accept jobs. The accompanying message
                        will be displayed on running taskers until another
                        message is generated during the course of normal
                        operation. Taskers that are not running will not display
                        the message after starting. The default opening message
                        is an empty string.
  Global Options are:
      -l            -- Use longer format with LIST (may be repeated).
      -v            -- Increase verbosity of messages.
      -cfgfile      -- Specify path to tasker config file, relative to SWD.
                       Default: taskers.tcl
      -failover     -- Restrict operation to dedicated failover taskers only.
  
  Options for SHOW are:
      -nameonly     -- Show only the names of the connected taskers.
      -nameid       -- Show only the names and id's of the connected taskers.
      -resourceonly -- Show only the resources of the connected taskers.
      -down         -- Show names of configured taskers that are down.
      -license      -- Show licensed capabilities of connected taskers.
      -taskergroups  -- Show tasker group for each connected tasker.
  
  Options for START and RESTART are:
      -server      -- Start the taskers by rsh/ssh from the vovserver host.
                      By default, the taskers are started
                      by the host that executes this script.
      -random      -- Start taskers in random order.
                      This is useful to start a large pool of tasker,
                      by running multiple concurrent commands like:
                        % vovtaskermgr start -random &
                        % vovtaskermgr start -random &
                        % vovtaskermgr start -random &
      -nolog       -- Redirect tasker output to /dev/null.
                      Useful to avoid huge log files in /usr/tmp
  
  Options for RESERVE are:
      -user        -- Reserve the tasker(s) for given list of users
                      (comma separated list)
      -group       -- Reserve the tasker(s) for given list of fairshare groups
                      (comma separated list)
      -jobclass    -- Reserve the tasker(s) for given list of jobclasses
                      (comma separated list)
      -jobproj     -- Reserve the tasker(s) for given list of job projects
                      (comma separated list)
      -osgroup     -- Reserve the tasker(s) for given list of Unix groups
                      (comma separated list)
      -bucketid    -- Reserve the tasker(s) for given list of queue buckets
                      (comma separated list)
      -id          -- Reserve the tasker(s) for given list of jobs
                      (comma separated list of job ids)
  
      -start       -- Reservation start time
      -end         -- Reservation end time
      -duration    -- Reservation duration (VOV timespec)
      -cancel      -- Cancel the reservation on tasker(s)
  
  Options for STOP are:
      -force       -- Stop taskers with force. BEWARE: kills running jobs.
      -noconfirm   -- Do not prompt for confirmation. Default is to prompt.
      -all         -- Stop all running taskers.
      -sick [TIMESPEC]
                   -- Stop all taskers that have been sick for
                      at least N seconds.
                      N is compared against the last time a heartbeat was
                      received by the server for each sick tasker.
                      All jobs running on a sick tasker being stopped will be
                      marked as failed in the server, even if the job does,
                      or has, completed successfully while the tasker is sick.
                      It is recommended to check tasker host connectivity before
                      using this function and allow for the tasker to reconnect
                      and send a heartbeat in case connectivity is restored.
  
  Parameters for CONFIGURE are:
      -allowcoredump <bool>    -- Control core-dump behavior.
      -autokillmethod <d|n|v>  -- Control autokill method.
      -capacity <CAP>[MAXCAP]  -- Specify capacity and optionally the
                                  max-capacity of the tasker. The capacity is
                                  the maximum number of jobs that can be run by
                                  tasker. The max_capacity is the maximum slots
                                  a tasker can be expanded to have when jobs are
                                  suspended. The default value for capacity is
                                  equal to the number of CORES present. The
                                  default value for max_capacity is 2*CAPACITY.
                                  Use N, N/N, CORES[-+*/]N, CORES[-+*/]N/N,
                                  N/CORES[-+*/]N, CORES[-+*/]N/CORES[-+*/]N to
                                  make adjustments from the default.
                                  Examples: 4, 4/8, CORES-2, CORES*0.8,
                                            CORES+0/20, CORES+2/CORES*2
      -cpus       <bool>       -- Number of CPU's in this machine.
      -debugcontainers <bool>  -- Enable debug logging of container activity.
      -debugjobcontrol <bool>  -- Enable debug logging of job control activity.
      -debugmultienv   <bool>  -- Enable debug logging of environment switching.
      -debugnuma       <bool>  -- Enable debug logging of NUMA activity.
      -debugusageinfo  <bool>  -- Enable debug logging of memory usage analysis.
      -maxload    <MAXLOAD>    -- Maximum load above which new jobs are refused.
                                  The default value for max_load is
                                  CAPACITY+0.5.
                                  Use 0 or less than 0 to specify default value.
                                  Use N or CAPACITY[-+*/]N to make adjustments
                                  from the default.
                                  Examples: 12.0, CAPACITY+2, CAPACITY*2
      -maxwaitnostart <N>      -- How long to wait for a job to start.
      -message    <string>     -- Set vovtasker message.
      -numabindtonode <bool>   -- Bind to entire NUMA node or individual cores.
                                  Default is to bind to entire NUMA node.
      -resources  <string>     -- vovtasker resources.
      -taskergroup <string>    -- The tasker group.
      -minramfree <N>          -- Minimum amount of free RAM in MB.
      -name       <string>     -- Name of vovtasker.
      -ramsentry  <bool>       -- Activate/Deactivate RAM SENTRY.
      -efftotram  <N>          -- Effective total RAM in MB.
      -retrychdir <N>          -- Specify number of retries for failed chdirs.
      -retrychdirsleep <N>     -- Specify the sleep interval time between
                                  retries for failed chdirs.
      -retrychdirbackoff <N>   -- Specify the factor multiplied to the sleep
                                  interval to increase sleep interval between
                                  retries for failed chdirs.
      -liverecorder on|off     -- Enable/disable Live Recorder debugging
                                  capability (linux64 only).
      -liverecorder.logdir <string>
                               -- Specify the directory in which the Live
                                  Recorder recording file should be saved. The
                                  directory must exist. Default is "/tmp".
      -liverecorder.logsize <N> --
                                  Specify the Live Recorder log size in MB.
                                  Default: 256, Min: 256, Max: 65536.
      -liverecorder.mode <string>
                               -- Specify the Live Recorder mode, which is one of
                                  the following: tasker, subtasker, both.
                                  Note that enabling subtasker recording results
                                  in a recording file for each job executed on
                                  the tasker.
                                  Default: tasker.
      -rawpower                -- Specify a raw power figure for initial tasker
                                  startup.
      -mindisk                 -- Specify minimum /tmp disk in MB or
                                  percentage (0%-99%, for example, 10%)
                                  required for tasker startup.
      -coeff                   -- Specify a scaling factor from 0.01-100.0
                                  used to derate tasker power.
      -sendenv  <name>         -- Send a named environment to a tasker.
      -setenv   VAR=VALUE      -- Set a variable in the tasker environment.
                                  ("VAR=VALUE" must be quoted on Windows)
      -taskerheartbeat <N>     -- Specify the heartbeat for a tasker.
      -unsetenv VAR            -- Unset a variable in the tasker environment.
  
  EXAMPLES:
      % vovtaskermgr show
      % vovtaskermgr show -nameid
      % vovtaskermgr start
      % vovtaskermgr start unix1
      % vovtaskermgr start -random            -- Start taskers in random order.
      % vovtaskermgr update
      % vovtaskermgr restart
      % vovtaskermgr stop                     -- Stop all taskers, let running
                                                 jobs finish.
      % vovtaskermgr stop -noconfirm          -- Like above, no confirmation
                                                 required.
      % vovtaskermgr stop -force              -- Kill running jobs now
                                                 (-noconfirm implied).
      % vovtaskermgr reserve -user john \\
               -duration 3h jupiter           -- Reserve tasker jupiter for user
                                                 john for 3h from now
      % vovtaskermgr configure -message "shutdown 1PM" farm11 farm12
      % vovtaskermgr printstatus farm11
      % vovtaskermgr rotatelog                -- Recreate missing log files for
                                                 all connected taskers
      % vovtaskermgr rotatelog farm2 farm11   -- Recreate missing log files for
                                                 tasker farm2 farm11
  
      % vovtaskermgr configure jupiter -sendenv BASE
                                              -- send the BASE environment to
                                                 tasker jupiter
  

Starting Many Taskers in Parallel

If you have hundreds of taskers to start, it may take some time. You can speed up the process by running multiple start scripts with the -random option, which is useful to start taskers in random order.

For example:
% vovtaskermgr start -random &
% vovtaskermgr start -random &
% vovtaskermgr start -random &
% vovtaskermgr start -random &
% vovtaskermgr start -random &
% vovtaskermgr start -random &

Tasker Configuration on the Fly

Many vovtasker characteristics can be changed on the fly using vovtaskermgr configure. For example, you can change the capacity of a tasker, i.e. the maximum number of jobs that the tasker can take, with:
% vovtaskermgr configure -capacity 8 pluto
Setting the capacity to zero effectively disables the tasker:
% vovtaskermgr configure -capacity 0 pluto
% vovtaskermgr configure -message "Temporarily disabled by John" pluto

Tasker Capacity

The behavior of manually overriding vovtasker cores and capacity has been improved. By default, the capacity follows the core count, but it can also be manually set via the -T option or by defining the SLOTS/N consumable resource via the -r option, where N is a positive integer. In all cases, the capacity directly affects the number of slot licenses that will be requested.

Tasker Reservation

Below is an example of using vovtaskermgr to set a reservation on a tasker. In this case, you want to reserve the tasker called 'pluto' for user 'john' for 2 days.

If you wish for the vovtaskers to be reserved when they start, use the -reserve option in the taskers.tcl file.
% vovtaskermgr reserve -user john -duration 2d pluto