Page 6 of 7

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 9:35 am
by tgriep
Earlier in this thread, you said that the server is not receiving SNMP Traps but in the output of the ps command shows it is running.
The snmptt daemon does write it's information to the nagios.cmd files and if you are not using it, you should disable it and see if the problem goes away.

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 10:07 am
by lmiltchev
The permissions on the pipe are not correct. You have

Code: Select all

-rw-rw-r-- 1 root   nagcmd  500 Jul  1 10:39 nagios.cmd
but you should have:

Code: Select all

prw-rw----  1 nagios nagcmd    0 Jun 30 13:43 nagios.cmd
Try the deleting the file, and restarting nagios:

Code: Select all

rm -f /usr/local/nagios/var/rw/nagios.cmd
service nagios restart
The file should get recreated with the correct permissions. Let us know if this fixed your issue.

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 11:43 am
by highness
lmiltchev wrote:The permissions on the pipe are not correct. You have

Code: Select all

-rw-rw-r-- 1 root   nagcmd  500 Jul  1 10:39 nagios.cmd
but you should have:

Code: Select all

prw-rw----  1 nagios nagcmd    0 Jun 30 13:43 nagios.cmd
Try the deleting the file, and restarting nagios:

Code: Select all

rm -f /usr/local/nagios/var/rw/nagios.cmd
service nagios restart
The file should get recreated with the correct permissions. Let us know if this fixed your issue.
I've got a script that I run when acks quit working and the script already has that in it. So, *SOMETIMES* fixes it. But not always. Had two incidents today where I've had to re-run that script twice to get it to take.

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 12:26 pm
by lmiltchev
The snmptt daemon does write it's information to the nagios.cmd files and if you are not using it, you should disable it and see if the problem goes away.
Have you tried what tgriep suggested?

Code: Select all

service snmptt stop
chkconfig snmptt off
If you are still having issues after running the above commands, we may need to move this to our email ticketing system, and possibly schedule a remote session.

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 12:37 pm
by highness
Can't really do that, as we use SNMPTT.

Re: Unable to ACK (again)

Posted: Tue Jul 05, 2016 1:17 pm
by tgriep
Can you post the config files from the /etc/snmp folder so we can view them?
Can you post your nagios.cfg file as well?

Re: Unable to ACK (again)

Posted: Wed Jul 27, 2016 2:36 pm
by highness
Was working on other projects since the last post from tgreip and just saw this.

Out of frustration/desperation, I shut down snmptt daemon and killed all the python scripts that were referencing it. That *SEEMS* to have fixed it. No loss of our acking super powers for the past couple of weeks.

I'd like to think that we could safely close this thread; but am curious to see if maybe something could be done to prevent this going forward? For me, its not THAT big of an issue, but gotta believe that someone is going to run into this down the road?

Re: Unable to ACK (again)

Posted: Wed Jul 27, 2016 4:37 pm
by tgriep
One thing we have had people try is to restart the snmptt daemon when the nagios process is restarted.
You could try that and see if it helps.
Add this

Code: Select all

/etc/init.d/snmptt restart
to the /etc/init.d/nagios init script around line 196 so the 2 processes will get restarted at the same time.

Re: Unable to ACK (again)

Posted: Wed Jul 27, 2016 4:52 pm
by highness
tgriep wrote:One thing we have had people try is to restart the snmptt daemon when the nagios process is restarted.
You could try that and see if it helps.
Add this

Code: Select all

/etc/init.d/snmptt restart
to the /etc/init.d/nagios init script around line 196 so the 2 processes will get restarted at the same time.
I added that earlier in my trials - with no joy. The box would still lose the ability to ack.

Re: Unable to ACK (again)

Posted: Thu Jul 28, 2016 9:00 am
by tgriep
Last thing to check is the /usr/local/bin/snmptraphandling.py script. Maybe something is wrong in that file and could be causing the issue.
Could you post it?