Page 1 of 1
Commands stop working
Posted: Sat May 14, 2016 1:58 am
by BanditBBS
So I noticed today that my commands stop working every once in a while and I have to restart nagios or apply config to get them working again. I just tried to enable notifications(had them disabled for a major outage) and it wrote the command to the file, but it never processed. I had to apply config, wait and then try again.
File before apply and contents:
Code: Select all
-rw-r--r-- 1 nagios nagcmd 36 May 14 01:50 nagios.cmd
[1463208613] ENABLE_NOTIFICATIONS;0
File after I applied:
Code: Select all
prw-rw---- 1 nagios nagcmd 0 May 14 01:55 nagios.cmd
Re: Commands stop working
Posted: Mon May 16, 2016 1:33 am
by Box293
Stop the nagios service.
Then check to see if the cmd pipe exists (it shouldn't).
Wait a couple of minutes with nagios still stopped.
Does the command pipe get re-created without nagios started?
Do you use snmptraps?
If so, what is the output of this command:
Code: Select all
md5sum /usr/local/bin/snmptraphandling.py
I get:
0639919b86c9e659ed04b1c63052bbbc /usr/local/bin/snmptraphandling.py
Re: Commands stop working
Posted: Mon May 16, 2016 8:10 am
by BanditBBS
Hey Troy....
md5sum output is identical.
As for all your other questions:
1.)if I stop the process, it does vanish
2.)It never reappears until the process is started back up(only waited a few minutes, can't wait too long)
Re: Commands stop working
Posted: Mon May 16, 2016 12:00 pm
by ssax
Do you have any other checks, event handlers, cron jobs, or anything that would be writing to the command file?
Just so we know which file your SNMPTT is using, please post the output of one of your /etc/snmp/snmptt.conf EXEC lines from one of your working traps and the contents of your /etc/snmp/snmptrapd.conf file.
Thank you
Re: Commands stop working
Posted: Mon May 16, 2016 12:37 pm
by BanditBBS
None of the execs do anything yet, we aren't using them yet.....but...
Code: Select all
[root@iss-chi-nag05 xicore]# cat /etc/snmp/snmptrapd.conf
disableAuthorization yes
traphandle default /usr/sbin/snmptthandler
Yes, I have a custom component that writes to the nagios.cmd file. Its for entering downtimes as it adds the options for entering servicegroup and hostgroup ones and allows me to set some defaults to help eliminate human errors when creating downtime.
Re: Commands stop working
Posted: Mon May 16, 2016 12:44 pm
by ssax
That's likely the cause, you need to have the component check that the filetype is fifo before writing:
http://php.net/manual/en/function.filetype.php
Re: Commands stop working
Posted: Mon May 16, 2016 1:08 pm
by BanditBBS
Very very interesting....so perhaps someone scheduled a downtime while a restart was in progress and my component then created the file?
I'm basically following the command examples here:
http://old.nagios.org/developerinfo/ext ... and_id=118
Just in php instead, here is my crappy code:
Code: Select all
$fh = fopen('/usr/local/nagios/var/rw/nagios.cmd', "a") or die(gettext("Error: Could not open downtime config file for writing."));
fwrite($fh, $cfg_str);
fclose($fh);
So I need to follow your lead and verify its fifo and if not wait a few and try again for a few times and then fail if still not?
Re: Commands stop working
Posted: Mon May 16, 2016 3:01 pm
by ssax
Yes, that is correct, that's why the snmptraphandling.py script was updated.
Re: Commands stop working
Posted: Mon May 16, 2016 3:09 pm
by BanditBBS
Cool, thanks for the information, I'll make the code change now....feel free to close this thread!
Re: Commands stop working
Posted: Mon May 16, 2016 4:44 pm
by mcapra
Closing it up!