Commands stop working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Commands stop working

Post by BanditBBS »

So I noticed today that my commands stop working every once in a while and I have to restart nagios or apply config to get them working again. I just tried to enable notifications(had them disabled for a major outage) and it wrote the command to the file, but it never processed. I had to apply config, wait and then try again.

File before apply and contents:

Code: Select all

-rw-r--r-- 1 nagios nagcmd  36 May 14 01:50 nagios.cmd
[1463208613] ENABLE_NOTIFICATIONS;0
File after I applied:

Code: Select all

prw-rw---- 1 nagios nagcmd   0 May 14 01:55 nagios.cmd
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Commands stop working

Post by Box293 »

Stop the nagios service.

Then check to see if the cmd pipe exists (it shouldn't).

Wait a couple of minutes with nagios still stopped.

Does the command pipe get re-created without nagios started?

Do you use snmptraps?

If so, what is the output of this command:

Code: Select all

md5sum /usr/local/bin/snmptraphandling.py
I get:
0639919b86c9e659ed04b1c63052bbbc /usr/local/bin/snmptraphandling.py
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Commands stop working

Post by BanditBBS »

Hey Troy....

md5sum output is identical.

As for all your other questions:
1.)if I stop the process, it does vanish
2.)It never reappears until the process is started back up(only waited a few minutes, can't wait too long)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Commands stop working

Post by ssax »

Do you have any other checks, event handlers, cron jobs, or anything that would be writing to the command file?

Just so we know which file your SNMPTT is using, please post the output of one of your /etc/snmp/snmptt.conf EXEC lines from one of your working traps and the contents of your /etc/snmp/snmptrapd.conf file.


Thank you
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Commands stop working

Post by BanditBBS »

None of the execs do anything yet, we aren't using them yet.....but...

Code: Select all

[root@iss-chi-nag05 xicore]# cat /etc/snmp/snmptrapd.conf
disableAuthorization yes
traphandle default /usr/sbin/snmptthandler
Yes, I have a custom component that writes to the nagios.cmd file. Its for entering downtimes as it adds the options for entering servicegroup and hostgroup ones and allows me to set some defaults to help eliminate human errors when creating downtime.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Commands stop working

Post by ssax »

That's likely the cause, you need to have the component check that the filetype is fifo before writing:

http://php.net/manual/en/function.filetype.php
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Commands stop working

Post by BanditBBS »

Very very interesting....so perhaps someone scheduled a downtime while a restart was in progress and my component then created the file?

I'm basically following the command examples here: http://old.nagios.org/developerinfo/ext ... and_id=118

Just in php instead, here is my crappy code:

Code: Select all

    $fh = fopen('/usr/local/nagios/var/rw/nagios.cmd', "a") or die(gettext("Error: Could not open downtime config file for writing."));
    fwrite($fh, $cfg_str);
    fclose($fh);
So I need to follow your lead and verify its fifo and if not wait a few and try again for a few times and then fail if still not?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Commands stop working

Post by ssax »

Yes, that is correct, that's why the snmptraphandling.py script was updated.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Commands stop working

Post by BanditBBS »

Cool, thanks for the information, I'll make the code change now....feel free to close this thread!
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Commands stop working

Post by mcapra »

Closing it up!
Former Nagios employee
https://www.mcapra.com/
Locked