Page 1 of 1
NPCD and Server Reboot
Posted: Thu Nov 10, 2016 11:50 am
by SteveBeauchemin
Not that I like rebooting, but sometimes after the quarterly OS patching the Nagios host needs a reboot. After reboot we always have to start npcd manually. I had just been dealing with this each time without looking for a more permanent hands off solution. Now I have to train other folks on the system and this is an issue I need resolved. So here is my question.
Is there a specific recommended stop and start Order/Sequence for Nagios components. I know there is, but I think it is time for me to revisit and I want to hear the latest answer.
These are the init.d items I stop and start with a script.
crond tomcat snmptt snmptrapd nagiosxi nagios ndo2db mod-gearman2-worker
gearmand npcd rrdcached postgresql mysqld httpd
Those items are what I consider all the Nagios XI related processes. To me, the order in /etc/rc3.d is not okay, otherwise npcd would come up after a reboot.
So, stop is one order, start is another order. What is the proper order for these? Some matter, others do not. This is RedHat 6 OS.
Okay, 2 questions...
What happens when I move to RedHat 7. How do I ensure the startup sequence there succeeds.
Please advise.
Thank You
Steve B
Re: NPCD and Server Reboot
Posted: Thu Nov 10, 2016 1:09 pm
by avandemore
You have a nice bit of detail in your sig but it's missing Nagios OS type/version.
Guessing CentOS 6:
chkconfig npcd on
Should look like this:
Code: Select all
# ls -la /etc/rc*/*npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc0.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc1.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc2.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc3.d/S94npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc4.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc5.d/S94npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc6.d/K06npcd -> ../init.d/npcd
Re: NPCD and Server Reboot
Posted: Thu Nov 10, 2016 1:24 pm
by SteveBeauchemin
So if npcd starts on S94 - same as mine... and Nagios starts on S99 - then since npcd needs nagios and Ramdisk running first, it will fail.
Hence my original question... what is the proper sequence of all those things. Is there an official stance on this or is it up to us Users to finger out.
Thanks
Steve B
(sig updated... thanks)
Re: NPCD and Server Reboot
Posted: Thu Nov 10, 2016 1:58 pm
by avandemore
The documented methods of
installing XI with or without a
ramdisk produce an init process as shown:
Code: Select all
# ls -la /etc/rc*/*npcd /etc/rc*/*nagios
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc0.d/K01nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc0.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc1.d/K01nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc1.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc2.d/K01nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc2.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc3.d/S94npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc3.d/S99nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc4.d/K01nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc4.d/K06npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc5.d/S94npcd -> ../init.d/npcd
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc5.d/S99nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 16 Oct 29 03:06 /etc/rc6.d/K01nagios -> ../init.d/nagios
lrwxrwxrwx. 1 root root 14 Oct 29 03:06 /etc/rc6.d/K06npcd -> ../init.d/npcd
It is working in multiple instances for us internally as well as many other customers. Do you have something particular about your system which may be causing this issue?
Can you attach
/usr/local/nagios/var/npcd.log?
Re: NPCD and Server Reboot
Posted: Thu Nov 10, 2016 2:24 pm
by SteveBeauchemin
Here is the pertinent part of the log file
The npcd at S94 starts before Nagios at S99 and the Ramdisk does not exist yet.
Code: Select all
[11-10-2016 04:27:33] NPCD: npcd Daemon (0.4.14) started with PID=3404
[11-10-2016 04:27:33] NPCD: Please have a look at 'npcd -V' to get license information
[11-10-2016 04:27:33] NPCD: HINT: load_threshold is enabled - ('20.000000')
[11-10-2016 04:27:33] NPCD: Error while get file list from spooldir (/var/nagiosramdisk/spool/perfdata/) - No such file or directory
[11-10-2016 04:27:33] NPCD: Exiting...
[11-10-2016 04:27:33] NPCD: Daemon ended. PID was '3404'
then, 20 minutes later, since it was noticed to be not running... we started it. More log data
Code: Select all
[11-10-2016 04:47:55] NPCD: npcd Daemon (0.4.14) started with PID=9941
[11-10-2016 04:47:55] NPCD: Please have a look at 'npcd -V' to get license information
[11-10-2016 04:47:55] NPCD: HINT: load_threshold is enabled - ('20.000000')
Spool Dir is not there until Ramdisk is running...
If I start it manually after nagios is running, and the ramdisk exists, it will run. and runs fine until the next patch / reboot cycle.
Actually, I can provide more information.
This is the boot start sequence I have today - just for the items I listed before.
/etc/rc3.d/S50snmptrapd
/etc/rc3.d/S51snmptt
/etc/rc3.d/S64mysqld
/etc/rc3.d/S64postgresql
/etc/rc3.d/S80tomcat
/etc/rc3.d/S85gearmand
/etc/rc3.d/S85httpd
/etc/rc3.d/S85mod-gearman2-worker
/etc/rc3.d/S90crond
/etc/rc3.d/S90rrdcached
/etc/rc3.d/S94npcd
/etc/rc3.d/S97ndo2db
/etc/rc3.d/S99nagios
/etc/rc3.d/S99nagiosxi
And the Stop sequence
/etc/rc6.d/K01nagios
/etc/rc6.d/K01nagiosxi
/etc/rc6.d/K01ndo2db
/etc/rc6.d/K06npcd
/etc/rc6.d/K10rrdcached
/etc/rc6.d/K15gearmand
/etc/rc6.d/K15httpd
/etc/rc6.d/K15mod-gearman2-worker
/etc/rc6.d/K20tomcat
/etc/rc6.d/K36mysqld
/etc/rc6.d/K36postgresql
/etc/rc6.d/K49snmptt
/etc/rc6.d/K50snmptrapd
/etc/rc6.d/K60crond
I am thinking that the start sequence should be modified slightly for just these 3 like this.
/etc/rc3.d/S92ndo2db
/etc/rc3.d/S93nagios
/etc/rc3.d/S93nagiosxi
Then npcd comes as 94. Or something like that.
I guess I'm just looking for some guidance, or opinion about this.
Thoughts anyone? I just want a reboot to be happy and start all the items as expected.
Sequence matters...
Thanks
Steve B
Re: NPCD and Server Reboot
Posted: Thu Nov 10, 2016 2:37 pm
by avandemore
Please attach /etc/sysconfig/nagios and /etc/init.d/nagios
If that doesn't contain great information:
In /etc/sysconfig/init you can set LOGLEVEL=8 and reboot. The console should contain detailed information.