Page 1 of 2

Error restarting nagios after pnp update

Posted: Thu Nov 30, 2017 8:35 am
by majed
Peace, version is 4.3.3, pnp updated to 0.6.26-r9, emerged in Gentoo. I changed some settings and wanted to restart nagios but got:

Code: Select all

 # /etc/init.d/nagios status
 * status: crashed
nagios ~ # /etc/init.d/nagios restart
 * Verifying config files ...                                                                                                  [ ok ]
 * Stopping nagios ...
 * Failed to stop nagios                                                                                                       [ !! ]
 * ERROR: nagios failed to stop
i tried uninstalling nagios and reemerging it but that didn't help, now i have to reboot to change nagios config. Nagios works nevertheless. I'd rather not upgrade from source.
what to do?

Re: Error restarting nagios after pnp update

Posted: Thu Nov 30, 2017 3:18 pm
by dwhitfield
Please post (or PM) your /etc/init.d/nagios, your nagios.cfg, and npcd.cfg

It may also be useful to get a tail of the nagios.log and npcd.log (please put these tails in code blocks).

Are you using the PNP4Nagios Broker Module?
PNP4Nagios Broker Module npcdmod.o is not compatible with Nagios Core 4.x
- https://docs.pnp4nagios.org/start

What version of PNP4Nagios were you using previously?

You may also want to contact PNP4Nagios: https://sourceforge.net/projects/pnp4nagios/support

UPDATE: init and two .cfg files shared with techs

Re: Error restarting nagios after pnp update

Posted: Thu Nov 30, 2017 3:30 pm
by tmcdonald
While researching the issue I only came upon one other post, and that is the one you made on the Gentoo forums - https://forums.gentoo.org/viewtopic-p-8148290.html

We haven't had any reports of this behavior aside from yours, which leads me to believe that if an answer is found it will likely be from the Gentoo forum members as this does not appear to strictly be an issue with the Nagios codebase, but rather the Gentoo package/atom/whatever the term is. We're also more of a CentOS/RHEL and Debian/Ubuntu forum generally, so our Gentoo-specific knowledge is not as great as theirs will be. As such, if the file paths or commands we post are inaccurate for a Gentoo system we apologize in advance.

To add on to what @dwhitfield posted, if you are using PNP as a module does the issue still occur if you disable it? That would help narrow down where the problem stems from.

Re: Error restarting nagios after pnp update

Posted: Thu Dec 14, 2017 7:40 am
by majed
when restarting nagios the log produces:

Code: Select all

Dec 14 15:35:26 nagios /etc/init.d/nagios[3628]: ERROR: nagios failed to stop
i can't know what pnp version i was using.
i tried uninstalling pnp but that didn't help!
Anything else needed?

Re: Error restarting nagios after pnp update

Posted: Thu Dec 14, 2017 5:06 pm
by tgriep
Can you run the following command to verify the nagios configuration files do not have any errors in them?

Code: Select all

/usr/sbin/nagios -v /etc/nagios/nagios.cfg
If they do, that would keep nagios from starting.

Also, can you post your commands.cfg file?

Re: Error restarting nagios after pnp update

Posted: Fri Dec 15, 2017 1:54 am
by majed
the preflight check produced no serious errors. There were duplicate definitions, I removed them but, as expected, that didn't help.

Re: Error restarting nagios after pnp update

Posted: Fri Dec 15, 2017 9:43 am
by dwhitfield
Can you post or PM your commands.cfg? If you don't literally have something called commands.cfg, just whatever config file where the commands are defined.

Re: Error restarting nagios after pnp update

Posted: Fri Dec 15, 2017 9:49 am
by tgriep
I received the commands.cfg file and shared it with the other techs.

When you try and start the daemon, do you see any errors in the nagios.log file or the messages file?

Code: Select all

/var/nagios/nagios.log
/var/log/messages
Try starting nagios from the command line by running the following as root. Post ant errors.

Code: Select all

/usr/sbin/nagios --daemon /etc/nagios/nagios.cfg

Re: Error restarting nagios after pnp update

Posted: Mon Dec 18, 2017 3:43 am
by majed

Code: Select all

 ~ # /etc/init.d/nagios restart
 * Verifying config files ...                                                                                                  [ ok ]
 * Stopping nagios ...
 * Failed to stop nagios                                                                                                       [ !! ]
 * ERROR: nagios failed to stop

Code: Select all

tail -f /var/nagios/nagios.log
[1513586156] wproc:   host=hidden; service=(null);
[1513586156] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1513586156] Warning: Check of host 'hidden' timed out after 30.01 seconds
[1513586156] wproc: Core Worker 3885: job 207346 (pid=11563): Dormant child reaped
[1513586160] wproc: Core Worker 3885: job 207350 (pid=11588) timed out. Killing it
[1513586160] wproc: CHECK job 207350 from worker Core Worker 3885 timed out after 30.01s
[1513586160] wproc:   host=hidden; service=(null);
[1513586160] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1513586160] Warning: Check of host 'hidden' timed out after 30.01 seconds
[1513586160] wproc: Core Worker 3885: job 207350 (pid=11588): Dormant child reaped

Code: Select all

~ # tail -f /var/log/messages

Dec 18 11:37:02 nagios /etc/init.d/nagios[12243]: ERROR: nagios failed to stop
Dec 18 11:37:07 nagios sudo[12080]: pam_unix(sudo:session): session closed for user root
Dec 18 11:37:12 nagios check_nrpe[12363]: Remote 10.1.1.13 does not support Version 3 Packets
Dec 18 11:37:12 nagios check_nrpe[12363]: Remote 10.1.1.13 accepted a Version 2 Packet

Code: Select all

/usr/sbin/nagios --daemon /etc/nagios/nagios.cfg
does not produce any output.

Re: Error restarting nagios after pnp update

Posted: Mon Dec 18, 2017 1:29 pm
by tgriep
I think the error is caused by the differences in the locations for the nagios.lock file.

In the nagios.cfg file, the lock_file option has the lock file in the following location.

Code: Select all

lock_file=/var/nagios/nagios.lock
In the /etc/init.d/nagios script, it is looking for the nagios.lock file in this location.

Code: Select all

pidfile="/run/nagios.lock"
I would make the paths the same and verify that the nagios user account has the permissions to create the lock file in that folder and see if the init script can restart the nagios daemon now that it can find the lock file.