Scheduled Jobs

Jobs that can not be dispatched immediately due to resource shortage, such as CPUs or software licenses, are put on the job queue.

Jobs are scheduled using the following rules:
  • Scheduling is first determined by the FairShare mechanism. All active FairShare groups, all groups with queued jobs, are ranked based on their distance from the target share of computing resources and the current number of running jobs.

    The FairShare group that is farthest behind the target has rank 0 (zero) and is selected first for scheduling. If none of the jobs from the FairShare group with rank 0 can be dispatched, Accelerator looks at the jobs for the FairShare group for rank +1 and so on.

  • For a given FairShare group, jobs with higher Priority are scheduled ahead of lower priority jobs.
  • For a given FairShare group of a given priority, jobs are scheduled on a first-come first-serve basis.
To check the status of the jobs in the queue, use the command nc summary or check /jobqueue?action=buckets. This page gives a report on all the classes of queued jobs (known as buckets):
  • The characteristics of the bucket: user, group, priority, and tool.
  • The number of jobs in the bucket and the age of the bucket: how long ago a job from that bucket was successfully dispatched.
  • The resources the jobs are waiting for.

Understand Why Jobs are Queued

In addition to the overall information about the job queue and its buckets, you can also query individual jobs or sets, using the CLI, GUI, or browser.
  • From the command line:
    % nc why jobID
  • From the Accelerator GUI, double-click a job and navigate to the Why tab.
  • From the browser, use the Jobs in Queue link from the Workload area of the home page.

The nc why command tries to give information about whatever object it is given, whether a job or a file, explaining why the object is in its current state. For example, a job might be waiting for FairShare, or for hardware or software resources. A job could be 'Invalid' because a predecessor dependency has failed, or it has been descheduled after submission, but before it was executed.

The information given as the main reason may not be the only reason a job is waiting. For example, if a job requests both License:foo and Limit:bar, and both are exhausted, it will be hard to tell which is the main wait reason. To save CPU cycles, the NC vovserver stops processing the resources list for additional wait reasons once the first one is encountered.

nc why

Show why a job is in the state it is.

vnc: Usage Message
          Show why a job is in the state it is.
  % nc why [OPTIONS] [ID]
          -h                   -- This help
          -json                -- Format output as JSON (valid for QUEUED or
                                  SCHEDULED jobs only).
          -jsondoc             -- Documentation for the output of the -json
                                  command-line argument.
          % nc why 12345
          % nc why -json 12345
          % nc why -jsondoc