Change How Long to Requeue Jobs after a Vnode Failure

Specify how long the server waits after it loses contact with the primary execution host before deleting or requeueing jobs.

Control how long the server waits before requeueing or deleting a job when it loses contact with the MoM on the job’s primary execution host. This is the delay between the time the server determines that the primary execution host cannot be contacted and the time it requeues the job, and does not include the time it takes to determine that the host is out of contact.
  1. Click the Configure tab.
  2. Choose the HPC to configure.

    Choose an HPC
    Figure 1. Choose an HPC
  3. Click Server Settings from the PBS Professional menu located on the left-hand side of the web page.
  4. Click located in the upper right-hand corner of the web page.
  5. Enable the Display Advanced Settings check box.
  6. Click located to the right of Requeue Jobs on Vnode Failure.
  7. For Requeue Jobs on Vnode Failure, enter the time the server waits before requeueing or deleting a job when it loses contact with the MoM .
    The duration can be entered as an integer in seconds.
    Valid options:
    • Greater than zero - The server waits for the specified number of seconds after losing contact with a primary execution host, then attempts to contact the primary execution host, and if it cannot, requeues any jobs that can be rerun and deletes any jobs that cannot be rerun.
    • Zero - Jobs are left in the Running state whether or not the server has contact with their primary execution host.
    • Less than zero - The attribute is treated as if it were set to 1, and jobs are deleted or requeued after the server has been out of contact with tye primary execution host, for 1 second.
  8. Click to save the change.