Unable to ACK (again)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
highness
Posts: 192
Joined: Thu May 01, 2014 4:25 pm

Re: Unable to ACK (again)

Post by highness »

tmcdonald wrote:Do you have passive checks enabled? Anything on a remote machine sending in to NRDP/NSCA? Those would both potentially cause the nagios.cmd to be overwritten as a normal file.

Update: I am almost certain this is the cause - notice that the file size is 520 bytes when it should be 0 for a normal pipe file. Can you cat the contents of the nagios.cmd file?
We do have passive checks enabled.

It happened again today, so here is the contents of the nagios.cmd file:

Code: Select all

[1458238885] PROCESS_SERVICE_CHECK_RESULT;XXXXXXXXXXXXXXXXXXXXXXXXXX;SNMP Traps;0;This trap/inform is sent to the man
ager whenever 501 1065 0 up   10   1129239315 1129239456 2 2 / enterprises.9.10.25.1.4.3.1.5.230055 ():501 enterprises.9
.10.25.1.4.3.1.6.230055 ():1065 enterprises.9.10.25.1.4.3.1.7.230055 ():0 ifOperStatus.1065 (INTEGER):up enterprises.9.1
0.25.1.4.3.1.3.230055 (): enterprises.9.10.25.1.4.3.1.4.230055 (): enterprises.9.10.25.1.4.3.1.8.230055 ():10   enterpri
ses.9.10.25.1.4.3.1.10.230055 ():1129239315 enterprises.9.10.25.1.4.3.1.11.230055 ():1129239456 enterprises.9.10.25.1.4.
3.1.14.230055 ():2 enterprises.9.10.25.1.4.3.1.12.230055 ():2
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Unable to ACK (again)

Post by tmcdonald »

That's definitely what is happening then - you have a passive check that comes in in-between the restarts, and creates the nagios.cmd file as a normal file instead of a named pipe. Unfortunately I don't have a fix for you off-hand, since nagios will refuse to start if that file exists. I'll need to have a dev provide a proper fix but I might be able to hack something up quick until then - are you using NRDP for passive results, or NSCA?
Former Nagios employee
highness
Posts: 192
Joined: Thu May 01, 2014 4:25 pm

Re: Unable to ACK (again)

Post by highness »

Not positive, but I believe that we are using NRDP for passive alerts.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Unable to ACK (again)

Post by tmcdonald »

Could you double-check? There seems to be some logic in NRDP to prevent this sort of thing:

Code: Select all

        // make sure we can write to external command file
        if(!isset($cfg["command_file"]))
                handle_api_error(ERROR_NO_COMMAND_FILE);
        if(!file_exists($cfg["command_file"]))
                handle_api_error(ERROR_BAD_COMMAND_FILE);
        if(!is_writeable($cfg["command_file"]))
                handle_api_error(ERROR_COMMAND_FILE_OPEN_WRITE);

        // open external command file
        if(($handle=@fopen($cfg["command_file"],"w+"))===false)
                handle_api_error(ERROR_COMMAND_FILE_OPEN);
NSCA uses port 5667 by default, so maybe see if that is in use.
Former Nagios employee
highness
Posts: 192
Joined: Thu May 01, 2014 4:25 pm

Re: Unable to ACK (again)

Post by highness »

Looks like we're using NSCA.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Unable to ACK (again)

Post by tmcdonald »

Probably going to need to go to a dev then. I have some C experience, but they will be able to produce a much better patch, and faster.

I'll be filing this on GitHub soon, and will update this thread when that is done.
Former Nagios employee
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Unable to ACK (again)

Post by tmcdonald »

I've been trying to replicate this internally by shutting down the nagios process and sending a passive result via NSCA:

Code: Select all

[root@localhost libexec]# service nagios status
nagios (pid 27145) is running...
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
prw-rw---- 1 nagios nagcmd 0 May 24 14:35 /usr/local/nagios/var/rw/nagios.cmd
[root@localhost libexec]# echo -e "somehost\t0\ttesting\n" | ./send_nsca -H localhost -p 5667 -to 10 -c /usr/local/nagios/etc/send_nsca.cfg
1 data packet(s) sent to host successfully.
[root@localhost libexec]# service nagios stop
Stopping nagios: done.
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
ls: cannot access /usr/local/nagios/var/rw/nagios.cmd: No such file or directory
[root@localhost libexec]# echo -e "somehost\t0\ttesting\n" | ./send_nsca -H localhost -p 5667 -to 10 -c /usr/local/nagios/etc/send_nsca.cfg
1 data packet(s) sent to host successfully.
[root@localhost libexec]# ls -l /usr/local/nagios/var/rw/nagios.cmd
ls: cannot access /usr/local/nagios/var/rw/nagios.cmd: No such file or directory
I cannot for the life of me get that nagios.cmd file to be written as a normal file.

In the NSCA configs there is a reference to /usr/local/nagios/var/rw/nsca.dump for exactly this scenario, so the contents will be dumped there instead, but on my system this file did not exist in testing, even when I created it manually and gave it full write permissions globally. Possibly a bug, but can you confirm whether that file exists, and if so post the contents?

Also, what NSCA and Core versions are you on? I am on NSCA 2.9.1 and Core 4.1.1 for reference.

/usr/local/nagios/bin/nsca
/usr/local/nagios/bin/nagios
Former Nagios employee
highness
Posts: 192
Joined: Thu May 01, 2014 4:25 pm

Re: Unable to ACK (again)

Post by highness »

I don't have this file: /usr/local/nagios/var/rw/nsca.dump

The versions I'm running are listed below:

Code: Select all

NSCA - Nagios Service Check Acceptor
Copyright (c) 2000-2007 Ethan Galstad (www.nagios.org)
Version: 2.7.2
Last Modified: 07-03-2007
License: GPL v2
Encryption Routines: AVAILABLE
and

Code: Select all

Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Unable to ACK (again)

Post by tmcdonald »

Would you be willing and able to update to NSCA 2.9.1? We can help with the "able" part if you need it, but if you have restrictions or red tape to go through for an upgrade we'll need to get that approval from your end.
Former Nagios employee
highness
Posts: 192
Joined: Thu May 01, 2014 4:25 pm

Re: Unable to ACK (again)

Post by highness »

Would you be willing and able to update to NSCA 2.9.1? We can help with the "able" part if you need it, but if you have restrictions or red tape to go through for an upgrade we'll need to get that approval from your end.
Absolutely able to update NSCA 2.9.1, but I'll have to get approval to update the box, but that shouldn't be a problem. I can submit a ticket for approval on Tuesday.

How do I get the 2.9.1 version?
Locked