Troubleshooting a SICK Tasker

The vovserver marks a vovtasker SICK when it has not received the vovtasker's heartbeat message for three consecutive update cycles.

Possible causes include:
  • The machine has crashed
  • The machine got disconnected
  • The top-level vovtaskerroot process has crashed or was killed

Since there is no single solution to this problem, here is a short debugging guide.

  1. Is vovtasker SICK?
    If vovtasker is SICK, use:
    % nc cmd vovtaskermgr stop name-of-SICK-vovtasker
    % nc cmd vovtaskermgr start name-of-SICK-vovtasker

    Otherwise, vovtasker will not start.

  2. Is the machine running?
    • No: you have a network problem: call IT
    • Yes: continue
  3. Is vovtasker/vovtaskerroot stuck?
    1. On Linux, check the process status with:
      # root privilege is needed
      % strace -p PID% pstack PID
      where PID is the PID of the vovtasker/vovtaskerroot process.
    • No: continue
    • Yes: often, the output of strace and pstack help diagnose the problem (e.g. a bad NFS mount, an unresponsive LDAP, ...).
    Sometimes you may not be able to figure out what is holding up the vovtasker. Submit a support request at http://www.altairone.com for assistance.