Error restarting nagios after pnp update

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
majed
Posts: 98
Joined: Mon Mar 17, 2014 5:29 am

Error restarting nagios after pnp update

Post by majed »

Peace, version is 4.3.3, pnp updated to 0.6.26-r9, emerged in Gentoo. I changed some settings and wanted to restart nagios but got:

Code: Select all

 # /etc/init.d/nagios status
 * status: crashed
nagios ~ # /etc/init.d/nagios restart
 * Verifying config files ...                                                                                                  [ ok ]
 * Stopping nagios ...
 * Failed to stop nagios                                                                                                       [ !! ]
 * ERROR: nagios failed to stop
i tried uninstalling nagios and reemerging it but that didn't help, now i have to reboot to change nagios config. Nagios works nevertheless. I'd rather not upgrade from source.
what to do?
Seek and you shall find, knock and it shall be opened, cry and you shall find comfort
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Error restarting nagios after pnp update

Post by dwhitfield »

Please post (or PM) your /etc/init.d/nagios, your nagios.cfg, and npcd.cfg

It may also be useful to get a tail of the nagios.log and npcd.log (please put these tails in code blocks).

Are you using the PNP4Nagios Broker Module?
PNP4Nagios Broker Module npcdmod.o is not compatible with Nagios Core 4.x
- https://docs.pnp4nagios.org/start

What version of PNP4Nagios were you using previously?

You may also want to contact PNP4Nagios: https://sourceforge.net/projects/pnp4nagios/support

UPDATE: init and two .cfg files shared with techs
Last edited by dwhitfield on Thu Dec 14, 2017 10:31 am, edited 1 time in total.
Reason: pm received
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Error restarting nagios after pnp update

Post by tmcdonald »

While researching the issue I only came upon one other post, and that is the one you made on the Gentoo forums - https://forums.gentoo.org/viewtopic-p-8148290.html

We haven't had any reports of this behavior aside from yours, which leads me to believe that if an answer is found it will likely be from the Gentoo forum members as this does not appear to strictly be an issue with the Nagios codebase, but rather the Gentoo package/atom/whatever the term is. We're also more of a CentOS/RHEL and Debian/Ubuntu forum generally, so our Gentoo-specific knowledge is not as great as theirs will be. As such, if the file paths or commands we post are inaccurate for a Gentoo system we apologize in advance.

To add on to what @dwhitfield posted, if you are using PNP as a module does the issue still occur if you disable it? That would help narrow down where the problem stems from.
Former Nagios employee
majed
Posts: 98
Joined: Mon Mar 17, 2014 5:29 am

Re: Error restarting nagios after pnp update

Post by majed »

when restarting nagios the log produces:

Code: Select all

Dec 14 15:35:26 nagios /etc/init.d/nagios[3628]: ERROR: nagios failed to stop
i can't know what pnp version i was using.
i tried uninstalling pnp but that didn't help!
Anything else needed?
Seek and you shall find, knock and it shall be opened, cry and you shall find comfort
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Error restarting nagios after pnp update

Post by tgriep »

Can you run the following command to verify the nagios configuration files do not have any errors in them?

Code: Select all

/usr/sbin/nagios -v /etc/nagios/nagios.cfg
If they do, that would keep nagios from starting.

Also, can you post your commands.cfg file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
majed
Posts: 98
Joined: Mon Mar 17, 2014 5:29 am

Re: Error restarting nagios after pnp update

Post by majed »

the preflight check produced no serious errors. There were duplicate definitions, I removed them but, as expected, that didn't help.
Seek and you shall find, knock and it shall be opened, cry and you shall find comfort
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Error restarting nagios after pnp update

Post by dwhitfield »

Can you post or PM your commands.cfg? If you don't literally have something called commands.cfg, just whatever config file where the commands are defined.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Error restarting nagios after pnp update

Post by tgriep »

I received the commands.cfg file and shared it with the other techs.

When you try and start the daemon, do you see any errors in the nagios.log file or the messages file?

Code: Select all

/var/nagios/nagios.log
/var/log/messages
Try starting nagios from the command line by running the following as root. Post ant errors.

Code: Select all

/usr/sbin/nagios --daemon /etc/nagios/nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
majed
Posts: 98
Joined: Mon Mar 17, 2014 5:29 am

Re: Error restarting nagios after pnp update

Post by majed »

Code: Select all

 ~ # /etc/init.d/nagios restart
 * Verifying config files ...                                                                                                  [ ok ]
 * Stopping nagios ...
 * Failed to stop nagios                                                                                                       [ !! ]
 * ERROR: nagios failed to stop

Code: Select all

tail -f /var/nagios/nagios.log
[1513586156] wproc:   host=hidden; service=(null);
[1513586156] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1513586156] Warning: Check of host 'hidden' timed out after 30.01 seconds
[1513586156] wproc: Core Worker 3885: job 207346 (pid=11563): Dormant child reaped
[1513586160] wproc: Core Worker 3885: job 207350 (pid=11588) timed out. Killing it
[1513586160] wproc: CHECK job 207350 from worker Core Worker 3885 timed out after 30.01s
[1513586160] wproc:   host=hidden; service=(null);
[1513586160] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1513586160] Warning: Check of host 'hidden' timed out after 30.01 seconds
[1513586160] wproc: Core Worker 3885: job 207350 (pid=11588): Dormant child reaped

Code: Select all

~ # tail -f /var/log/messages

Dec 18 11:37:02 nagios /etc/init.d/nagios[12243]: ERROR: nagios failed to stop
Dec 18 11:37:07 nagios sudo[12080]: pam_unix(sudo:session): session closed for user root
Dec 18 11:37:12 nagios check_nrpe[12363]: Remote 10.1.1.13 does not support Version 3 Packets
Dec 18 11:37:12 nagios check_nrpe[12363]: Remote 10.1.1.13 accepted a Version 2 Packet

Code: Select all

/usr/sbin/nagios --daemon /etc/nagios/nagios.cfg
does not produce any output.
Seek and you shall find, knock and it shall be opened, cry and you shall find comfort
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Error restarting nagios after pnp update

Post by tgriep »

I think the error is caused by the differences in the locations for the nagios.lock file.

In the nagios.cfg file, the lock_file option has the lock file in the following location.

Code: Select all

lock_file=/var/nagios/nagios.lock
In the /etc/init.d/nagios script, it is looking for the nagios.lock file in this location.

Code: Select all

pidfile="/run/nagios.lock"
I would make the paths the same and verify that the nagios user account has the permissions to create the lock file in that folder and see if the init script can restart the nagios daemon now that it can find the lock file.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked