NPCD System Status Issue
Posted: Tue Apr 21, 2015 11:35 pm
While this might seem a lot like some of my other recent topics it is something completely different. Last night I noticed my graphs had stopped updating. I did check and npcd is reported as running. Further investigation showed it was not. Just yet another issue where XI system status is not providing true updates.


Checking from the command line
[root@nagios nagios]# /etc/init.d/npcd status
NPCD running (pid 1519).
[root@nagios nagios]# ps -ef | grep 1519 | grep -v grep
root 1519 1 0 Apr21 ? 00:00:02 crond
root 64932 1519 0 00:23 ? 00:00:00 CROND
root 64933 1519 0 00:23 ? 00:00:00 CROND
root 64934 1519 0 00:23 ? 00:00:00 CROND
root 64935 1519 0 00:23 ? 00:00:00 CROND
root 64936 1519 0 00:23 ? 00:00:00 CROND
Of course after clicking the gear and restarting npcd you can guess what happened next. Cron jobs stopped running. Of course all nagios cron processes stopped at that point.
[root@nagios nagios]# ps -ef | grep crond | grep -v grep
[root@nagios nagios]#
Yet another time where sysstat.php (what drives those green checks) reported bogus info and where the XI interface killed off critical system components because it looked at a PID in a file without bothering to check if it really was that process. These system health indicators need to do more than to trust the init script. Improving the init.d script is the first step but if there if stale performance data queuing up and not being processed maybe the performance grapher is not running and doesn't deserve a green check mark.


Checking from the command line
[root@nagios nagios]# /etc/init.d/npcd status
NPCD running (pid 1519).
[root@nagios nagios]# ps -ef | grep 1519 | grep -v grep
root 1519 1 0 Apr21 ? 00:00:02 crond
root 64932 1519 0 00:23 ? 00:00:00 CROND
root 64933 1519 0 00:23 ? 00:00:00 CROND
root 64934 1519 0 00:23 ? 00:00:00 CROND
root 64935 1519 0 00:23 ? 00:00:00 CROND
root 64936 1519 0 00:23 ? 00:00:00 CROND
Of course after clicking the gear and restarting npcd you can guess what happened next. Cron jobs stopped running. Of course all nagios cron processes stopped at that point.
[root@nagios nagios]# ps -ef | grep crond | grep -v grep
[root@nagios nagios]#
Yet another time where sysstat.php (what drives those green checks) reported bogus info and where the XI interface killed off critical system components because it looked at a PID in a file without bothering to check if it really was that process. These system health indicators need to do more than to trust the init script. Improving the init.d script is the first step but if there if stale performance data queuing up and not being processed maybe the performance grapher is not running and doesn't deserve a green check mark.