Page 2 of 3
Re: SNMPTT service locked up
Posted: Tue Aug 02, 2016 5:52 am
by vijilants
I'm still having issues with this.....today we noticed no alarms since 10th July and had to restart SNMPTT and they all flooded the system and emails were flooded out all over the place.
When you do a service snmptt restart it comes up as fail to stop service and OK on service start.
Is there any way to do someting to check the status or restart this service say daily ?
Re: SNMPTT service locked up
Posted: Tue Aug 02, 2016 11:48 am
by gormank
See my suggestion in post 2 of this thread on May 3 to monitor the number of files in /var/spool/snmptt...
Have you updated Nagios since the change to the init script? Maybe it was overwritten.
Re: SNMPTT service locked up
Posted: Tue Aug 02, 2016 2:52 pm
by ssax
Good point gormank! vijilants, please validate that it's still in the init script.
You could setup a service check that monitors the directory/etc and then use an event handler to restart it OR you could setup a cron job and just restart it every day, but first check the init script.
Thank you
Re: SNMPTT service locked up
Posted: Tue Aug 02, 2016 4:59 pm
by Box293
gormank wrote:See my suggestion in post 2 of this thread on May 3 to monitor the number of files in /var/spool/snmptt...
Have you updated Nagios since the change to the init script? Maybe it was overwritten.
Here are the steps to do this:
https://support.nagios.com/kb/article.php?id=502
Re: SNMPTT service locked up
Posted: Tue Aug 02, 2016 8:14 pm
by gormank
service snmptt status shows running, while a restart shows failed, then started so monitoring the service will likely give a false sense of security. I monitor snmptt.
Re: SNMPTT service locked up
Posted: Wed Aug 03, 2016 2:23 pm
by ssax
vijilants, was it still in the init script?
Re: SNMPTT service locked up
Posted: Thu Aug 04, 2016 1:52 pm
by gormank
Could someone add a feature request to say check for the snmptt PID file (/var/run/snmptt.pid on RHEL) and restart if it exists?
This is a known issue that has existed for years. Why continue to pretend it doesn't exist?
I'm also curious as to why no one seems to want to monitor the number of files spooled but I guess I'm just dense...
Re: SNMPTT service locked up
Posted: Thu Aug 04, 2016 5:04 pm
by ssax
I've created a bug report for this with a link back to this thread with a bunch of notes on it:
Code: Select all
NEW TASK ID 9234 created - Nagios XI Bug Report: SNMPTT locks up after applying config
What I personally think is happening is that snmptt is trying to run the EXEC statement to put the trap into XI, it tries to write to the command file and can't because it's been deleted as a result of the nagios restart (from the apply config), and then it causes a hang/issue with SNMPTT. It's the only thing I can think of.
Re: SNMPTT service locked up
Posted: Thu Aug 04, 2016 5:05 pm
by Box293
gormank wrote:I'm also curious as to why no one seems to want to monitor the number of files spooled but I guess I'm just dense...
This KB article I provided a link for on Wednesday has steps for doing that:
https://support.nagios.com/kb/article.php?id=502
Re: SNMPTT service locked up
Posted: Fri Sep 02, 2016 4:54 am
by vijilants
ssax wrote:vijilants, was it still in the init script?
Yes it was still in there....
Code: Select all
# See how we were called.
case "$1" in
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
touch $NagiosVarDir/nagios.log $NagiosRetentionFile
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
/etc/init.d/snmptt restart
;;
stop)
echo -n "Stopping nagios:"
Can someone please advise on a quick fix for this in detail please on how to get this checked and restart it if it fails ? We simply do not have the time to be monitoring the box manually.
This has become a real pain and it it highly annoying to people on the mailing list to be flooded with historic alarms.
Thanks