Implement Starving Jobs

Track the amount of time a job has been waiting to run and then mark the job as starving if this time has passed a specified limit.

Starving Jobs

Overview of starving jobs including parameters.

PBS can keep track of the amount of time a job has been waiting to run, and then mark the job as starving if this time has passed a specified limit. You can use this starving status in calculating both execution and preemption priority.

You enable tracking whether jobs are starving by enabling the Help Starving Jobs parameter. It is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time

You specify the amount of time required for a job to be considered starving in the Job Starving Time parameter. The default for this parameter is 24 hours.

PBS can use one of the following kinds of time to determine whether a job is starving:

  • The job’s eligible wait time.
  • The amount of time the job has been queued.

Starving Jobs Parameters

Help Starving Jobs
Setting this option enables starving job support. Once jobs have waited for the amount of time given by Job Starving Time they are considered starving. If a job is considered starving, no lower-priority jobs will run until the starving job can be run, unless backfilling is also specified. To use this option, the Job Starving Time parameter needs to be set as well.
Job Starving Time
The amount of time before a job is considered starving. This variable is used only if Help Starving Jobs is enabled.
Jobs Starve by Eligible Time

Controls starving behavior. When enabled, each job’s eligible time value is used as its wait time for starving. If disabled, the amount of time the job has been queued is used as its wait time for starving.

Using Job’s Eligible Wait Time to Determine a Job is Starving

PBS provides a method for tracking how long a job that is eligible to run has been waiting to run. By “eligible to run”, we mean that the job could run if the required resources were available. The time that a job waits while it is not running can be classified as “eligible” or “ineligible”. Roughly speaking, a job accrues eligible wait time when it is blocked due to a resource shortage, and accrues ineligible wait time when it is blocked due to project, user, or group limits.

PBS can use the job's eligible wait time to determine whether the job is starving. A starving job is one that's wait time has exceeded a configurable maximum. PBS can keep track of the amount of time a job has been waiting to run, and then mark the job as starving if this time has passed the maximum limit. You can use this starving status in calculating both execution and preemption priority.

When Jobs Starve by Eligible Time is enabled, each job’s eligible time value is used as its wait time for starving. If Jobs Starve by Eligible Time is disabled, the amount of time the job has been queued is used as its wait time for starving.

If Jobs Starve by Eligible Time is disabled, the following rules apply:
  • The amount of time the job has been queued is used as its wait time for starving.
  • Jobs lose their queue wait time whenever they are requeued, as with the qrerun command. This includes when they are checkpointed or requeued (but not suspended) during preemption.
  • Suspended jobs do not lose their queue wait time. However, when they become suspended, the amount of time since they were submitted is counted towards their queue wait time. For example, if a job was submitted, then remained queued for 1 hour, then ran for 26 hours, then was suspended, if Job Starving Time is 24 hours, then the job will become starving.
If Jobs Starve by Eligible Time is enabled, the following rules apply:
  • The job’s eligible time value is used as its wait time for starving.
  • Jobs do not lose their eligible time when they are requeued.
  • Jobs do not lose their eligible time when they are suspended.

Enable Starving Jobs

Enable tracking whether jobs are starving.

Important: The Help Starving Jobs and Job Starving Time parameters can only be updated if the cluster was added by a user with passwordless sudo permissions and therefore may not appear as a Scheduling parameter.
  1. Click the Configure tab.
  2. Choose the HPC to configure.

    Choose an HPC
    Figure 1. Choose an HPC
  3. Click Scheduling from the PBS Professional menu located on the left-hand side of the web page.
  4. Scroll down to the second list of Scheduling parameters.
  5. Click located to the right of Help Starving Jobs.
  6. Choose one of the following options to enable or disable starving job support:
    • To enable tracking of starving jobs for both prime and non-prime time, enable Help Starving Jobs and enable both Prime and Non-Prime.
    • To enable tracking of starving jobs for only prime time, enable Help Starving Jobs and enable Prime and disable Non-Prime.
    • To enable tracking of starving jobs for only non-prime time, enable Help Starving Jobs and enable Non-Prime and disable Prime.
    • To disable tracking of starving jobs disable Help Starving Jobs.
  7. Click to save the changes.
  8. Scroll up to the first list of Scheduling parameters.
  9. Click located to the right of Job Starving Time.
  10. For Job Starving Time, specify the amount of time required for a job to be considered starving.
    The duration can be entered as an integer in seconds or in the format: [[HH:]MM:]SS[.milliseconds]. Default is 24 hours.
  11. Click to save the change.
  12. Optional: Click located to the right of Jobs Starve by Eligible Time.
  13. Optional: For Jobs Starve by Eligible Time, choose one of the following options:
    • Enable this option to enable using the job's eligible time to be used as its starving time.
      Note: Jobs accrue eligible time or ineligible time or run time as appropriate. A job’s eligible time is used for starving calculation starting with the next scheduling cycle.
    • Disable this option so that the amount of time the job has been queued is used as its wait time is used for starving.
      Note: By default this parameter is disabled.
  14. Click to save the change.