Page 1 of 1

Perfomance Grapher Component Status Wrong

Posted: Tue Apr 07, 2015 2:48 pm
by rseiwert
Since this is same same only different thought I would start a new thread.
Noticed today that my performance data was not hitting graphs but the XI System Component Status for the Performance Grapher has got a green check and when you hover it it says NCPD running (pid 1529) . Normally I fix this by restarting the service but today I wondered why. I verified NCPD was definitely not running but when I ran /etc/init.d/npcd status it said it was. The init.d status only checked the process ID is running and not that it is actually running ncpd. See below.

[root@nagios subsys]# date
Tue Apr 7 15:37:57 EDT 2015
[root@nagios subsys]# ps -ef | grep npcd | grep -v grep
[root@nagios subsys]# ls -l npcd.pid
-rw-r--r-- 1 root root 4 Apr 6 18:35 npcd.pid
[root@nagios subsys]# /etc/init.d/npcd start
NPCD already started
[root@nagios subsys]# cat npcd.pid
1529
[root@nagios subsys]# ps -p 1529
PID TTY TIME CMD
1529 ? 00:00:00 httpd
Another side effect of not checking the process ID. When I did /etc/init.d/ncpd/restart it killed off the root httpd killing of the web interface all users!

Re: Perfomance Grapher Component Status Wrong

Posted: Tue Apr 07, 2015 3:29 pm
by abrist
rseiwert wrote:[root@nagios subsys]# ls -l npcd.pid
-rw-r--r-- 1 root root 4 Apr 6 18:35 npcd.pid
The permissions on the npcd.pid file look wrong. From my test server:

Code: Select all

[root@localhost ~]# ls -la $(locate npcd.pid)
-rw-r--r-- 1 nagios nagios 5 Jan 19 15:16 /usr/local/nagiosxi/var/subsys/npcd.pid
What is your system umask?

Code: Select all

umask

Re: Perfomance Grapher Component Status Wrong

Posted: Tue Apr 07, 2015 3:33 pm
by cmerchant
Interesting, I could duplicate the same behavior. And I believe the PID file is something left over from a crash of my XI server.

Code: Select all

 service npcd status
NPCD running (pid 1362).
[root@nagiosd1 var]# ps -efw | grep npcd | grep -v grep
[root@nagiosd1 var]# ps -p 1362
  PID TTY          TIME CMD
 1362 ?        00:00:10 rrdcached
[root@nagiosd1 var]# ps -efw | grep 1362 | grep -v grep
nagios    1362     1  0 Apr03 ?        00:00:10 /usr/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -s nagios -m 0660 -l unix:/var/rrdtool/rrdcached/rrdcached.sock -F -w 900 -z 90 -j /tmp/ -b /var/rrdtool/rrdcached -P FLUSH,PENDING
Which is why when I wacked the PID file;

Code: Select all

rm /usr/local/nagiosxi/var/subsys/npcd.pid 
I received an error message:

Code: Select all

rm: cannot remove `/usr/local/nagiosxi/var/subsys/npcd.pid': No such file or directory
NPCD was not running.
And everything was happy again with

Code: Select all

service npcd start
NPCD started.
[root@nagiosd1 ~]# service npcd status
NPCD running (pid 51571).
This behavior can result from a crash or killing off the task without the npcd init script or the service npcd stop | restart... I think the service npcd status should actually check whether the PID matches the name of the service. Restarts after a reboot would typically have a /var/run/*pid cleanup routine. But the init script should be smarter. I will post a bug.

Re: Perfomance Grapher Component Status Wrong

Posted: Tue Apr 07, 2015 4:09 pm
by cmerchant
Just another update about umask and permissions, If the process was restarted in the gui, it will have nagios:nagios as the owner, umask is not the issue.