Job Runtime - Monitor and Profile

When a job is running through a vovtasker, the tasker automatically monitors RAM and CPU utilization of the job, including all of its children.

Job statistics are sampled about once a minute. This data sampling rate does not capture jobs that complete in less time than the than the sampling period.

The MAXRAM is expressed in Megabytes (MB), where 1MB = 1<<20 bits (left-shift decimal "1" 20 times is the binary equivalent of 1 million. The CPU time is stored in ms (milliseconds), but is expressed in s (seconds).

CPU Progress and Run Status Indicators

Accelerator monitors CPU and RAM utilization for all the running jobs. The CPU utilization information is available in four fields:
CPUTIME
The total accumulated CPU time in milliseconds.
CPUPROGRESS
Percentage of CPU accumulated in the unit time. For example, if in 60 seconds a job uses 60 seconds of CPU time, then the CPUPROGRESS is going to be 100. This field can be 0 (zero) for jobs that are stuck: holding onto the CPU resource but not running, which makes the CPU unavailable for other jobs. This field can also be greater than 100 for multi-threaded jobs.
LASTCPUPROGRESS
A timestamp indicating the last time CPU usage has increased. This is used to identify stuck jobs.
RUNSTATUS
A descriptive text field that shows how well the job is doing. Some typical values are Good, Paging, NoCpu. The complete list of values is shown below.
Table 1. Values of the RUNSTATUS Field
n/a Insufficient information to determine CPU progress. Typical for jobs that have just started.
Good The progress is greater than 70%
Medium Progress is between 10% and 70%
Poor Less than 10% CPU utilization, but no swapping of pages.
Paging The progress is less than 10% and the job is swapping at a rate greater than 1000 pages per second.
NoCpu The job is not accumulating any CPU time.
Susp The job is suspended.

Job Profiling

When job profiling is activated, Accelerator tracks memory usage, CPU usage and license usage over the lifetime of a job.

The output of job profiling is a set of plots as shown below:


Figure 1.

The first plot from the top shows the RAM usage over time. The second plot shows CPU usage over time. The third plot shows usage for licenses that were requested at submission time. The fourth plot show usage of a license that was not requested ("Requested/Used")

To activate profiling on a single job, use the option -profile of nc run as shown below:
% nc run -profile myJob

To view a profile, use the browser interface and visit the specific page for the job.

To activate job profiling for a jobclass, set the following:
# In a job class definition
set VOV_JOB_DESC(profile) 1
To activate job profiling for all jobs, use the file $VOVDIR/local/vncrun.config.tcl and add a line like this:
# In the file $VOVDIR/local/vncrun.config.tcl
...
set VOV_JOB_DESC(profile) 1
...