Specify how long the server waits after it loses contact with the primary execution
host before deleting or requeueing jobs.
Control how long the server waits before requeueing or deleting a job when it loses
contact with the MoM on the job’s primary execution host. This is the delay between the
time the server determines that the primary execution host cannot be contacted and the
time it requeues the job, and does not include the time it takes to determine that the
host is out of contact.
-
Click the Configure tab.
-
Choose the HPC to configure.
-
Click Server Settings from the
PBS Professional menu located on the left-hand side of the
web page.
-
Click located in the upper right-hand corner of the web page.
-
Enable the Display Advanced Settings check box.
-
Click located to the right of Requeue Jobs on
Vnode Failure.
-
For Requeue Jobs on Vnode Failure, enter the time the
server waits before requeueing or deleting a job when it loses contact with the
MoM .
The duration can be entered as an integer in seconds.
Valid options:
- Greater than zero - The server waits for the specified number of seconds
after losing contact with a primary execution host, then attempts to
contact the primary execution host, and if it cannot, requeues any jobs
that can be rerun and deletes any jobs that cannot be rerun.
- Zero - Jobs are left in the Running state whether or not the server has
contact with their primary execution host.
- Less than zero - The attribute is treated as if it were set to 1, and
jobs are deleted or requeued after the server has been out of contact
with tye primary execution host, for 1 second.
-
Click to save the change.