Page 4 of 7
Re: Unable to ACK (again)
Posted: Thu Mar 17, 2016 3:30 pm
by highness
tmcdonald wrote:Do you have passive checks enabled? Anything on a remote machine sending in to NRDP/NSCA? Those would both potentially cause the nagios.cmd to be overwritten as a normal file.
Update: I am almost certain this is the cause - notice that the file size is 520 bytes when it should be 0 for a normal pipe file. Can you cat the contents of the nagios.cmd file?
We do have passive checks enabled.
It happened again today, so here is the contents of the nagios.cmd file:
Code: Select all
[1458238885] PROCESS_SERVICE_CHECK_RESULT;XXXXXXXXXXXXXXXXXXXXXXXXXX;SNMP Traps;0;This trap/inform is sent to the man
ager whenever 501 1065 0 up 10 1129239315 1129239456 2 2 / enterprises.9.10.25.1.4.3.1.5.230055 ():501 enterprises.9
.10.25.1.4.3.1.6.230055 ():1065 enterprises.9.10.25.1.4.3.1.7.230055 ():0 ifOperStatus.1065 (INTEGER):up enterprises.9.1
0.25.1.4.3.1.3.230055 (): enterprises.9.10.25.1.4.3.1.4.230055 (): enterprises.9.10.25.1.4.3.1.8.230055 ():10 enterpri
ses.9.10.25.1.4.3.1.10.230055 ():1129239315 enterprises.9.10.25.1.4.3.1.11.230055 ():1129239456 enterprises.9.10.25.1.4.
3.1.14.230055 ():2 enterprises.9.10.25.1.4.3.1.12.230055 ():2
Re: Unable to ACK (again)
Posted: Thu Mar 17, 2016 3:37 pm
by tmcdonald
That's definitely what is happening then - you have a passive check that comes in in-between the restarts, and creates the nagios.cmd file as a normal file instead of a named pipe. Unfortunately I don't have a fix for you off-hand, since nagios will refuse to start if that file exists. I'll need to have a dev provide a proper fix but I might be able to hack something up quick until then - are you using NRDP for passive results, or NSCA?
Re: Unable to ACK (again)
Posted: Wed May 18, 2016 9:12 am
by highness
Not positive, but I believe that we are using NRDP for passive alerts.
Re: Unable to ACK (again)
Posted: Wed May 18, 2016 10:58 am
by tmcdonald
Could you double-check? There seems to be some logic in NRDP to prevent this sort of thing:
Code: Select all
// make sure we can write to external command file
if(!isset($cfg["command_file"]))
handle_api_error(ERROR_NO_COMMAND_FILE);
if(!file_exists($cfg["command_file"]))
handle_api_error(ERROR_BAD_COMMAND_FILE);
if(!is_writeable($cfg["command_file"]))
handle_api_error(ERROR_COMMAND_FILE_OPEN_WRITE);
// open external command file
if(($handle=@fopen($cfg["command_file"],"w+"))===false)
handle_api_error(ERROR_COMMAND_FILE_OPEN);
NSCA uses port 5667 by default, so maybe see if that is in use.
Re: Unable to ACK (again)
Posted: Wed May 18, 2016 12:58 pm
by highness
Looks like we're using NSCA.
Re: Unable to ACK (again)
Posted: Wed May 18, 2016 4:23 pm
by tmcdonald
Probably going to need to go to a dev then. I have some C experience, but they will be able to produce a much better patch, and faster.
I'll be filing this on GitHub soon, and will update this thread when that is done.
Re: Unable to ACK (again)
Posted: Wed May 25, 2016 10:33 am
by tmcdonald
I've been trying to replicate this internally by shutting down the nagios process and sending a passive result via NSCA:
Code: Select all
[root@localhost libexec]# service nagios status
nagios (pid 27145) is running...
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
prw-rw---- 1 nagios nagcmd 0 May 24 14:35 /usr/local/nagios/var/rw/nagios.cmd
[root@localhost libexec]# echo -e "somehost\t0\ttesting\n" | ./send_nsca -H localhost -p 5667 -to 10 -c /usr/local/nagios/etc/send_nsca.cfg
1 data packet(s) sent to host successfully.
[root@localhost libexec]# service nagios stop
Stopping nagios: done.
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
ls: cannot access /usr/local/nagios/var/rw/nagios.cmd: No such file or directory
[root@localhost libexec]# echo -e "somehost\t0\ttesting\n" | ./send_nsca -H localhost -p 5667 -to 10 -c /usr/local/nagios/etc/send_nsca.cfg
1 data packet(s) sent to host successfully.
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
ls: cannot access /usr/local/nagios/var/rw/nagios.cmd: No such file or directory
I cannot for the life of me get that
nagios.cmd file to be written as a normal file.
In the NSCA configs there is a reference to
/usr/local/nagios/var/rw/nsca.dump for exactly this scenario, so the contents will be dumped there instead, but on my system this file did not exist in testing, even when I created it manually and gave it full write permissions globally. Possibly a bug, but can you confirm whether that file exists, and if so post the contents?
Also, what NSCA and Core versions are you on? I am on NSCA 2.9.1 and Core 4.1.1 for reference.
/usr/local/nagios/bin/nsca
/usr/local/nagios/bin/nagios
Re: Unable to ACK (again)
Posted: Thu May 26, 2016 12:08 pm
by highness
I don't have this file: /usr/local/nagios/var/rw/nsca.dump
The versions I'm running are listed below:
Code: Select all
NSCA - Nagios Service Check Acceptor
Copyright (c) 2000-2007 Ethan Galstad (www.nagios.org)
Version: 2.7.2
Last Modified: 07-03-2007
License: GPL v2
Encryption Routines: AVAILABLE
and
Code: Select all
Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
Re: Unable to ACK (again)
Posted: Thu May 26, 2016 5:22 pm
by tmcdonald
Would you be willing and able to update to NSCA 2.9.1? We can help with the "able" part if you need it, but if you have restrictions or red tape to go through for an upgrade we'll need to get that approval from your end.
Re: Unable to ACK (again)
Posted: Fri May 27, 2016 4:12 pm
by highness
Would you be willing and able to update to NSCA 2.9.1? We can help with the "able" part if you need it, but if you have restrictions or red tape to go through for an upgrade we'll need to get that approval from your end.
Absolutely able to update NSCA 2.9.1, but I'll have to get approval to update the box, but that shouldn't be a problem. I can submit a ticket for approval on Tuesday.
How do I get the 2.9.1 version?