The vovserver marks a vovtasker SICK when it has not received the vovtasker's heartbeat message for three consecutive update cycles.
Possible causes include:
- The machine has crashed
- The machine got disconnected
- The top-level vovtaskerroot
process has crashed or was killed
Since there is no single solution to this problem, here is a short debugging guide.
-
Is vovtasker SICK?
If
vovtasker is SICK, use:
% nc cmd vovtaskermgr stop name-of-SICK-vovtasker
% nc cmd vovtaskermgr start name-of-SICK-vovtasker
Otherwise,
vovtasker will not start.
-
Is the machine running?
- No: you have a network problem: call IT
- Yes: continue
-
Is vovtasker/vovtaskerroot stuck?
-
On Linux, check the process status with:
# root privilege is needed
% strace -p PID% pstack PID
where
PID is the PID of the
vovtasker/
vovtaskerroot process.
- No: continue
- Yes: often, the output of strace and
pstack help diagnose the problem (e.g. a bad NFS
mount, an unresponsive LDAP, ...).
Sometimes you may not be able to figure out what is holding up
the vovtasker. Submit a support request at Altair Community. for
assistance.