Page 1 of 3
SNMPTT service locked up
Posted: Tue May 03, 2016 6:08 am
by vijilants
System:
Nagios XI Version : 5.2.3
CentOS release 6.5 (Final)
Hi,
Can you please advise.
Today we found that one of our Nagios systems had not been reporting on SNMP traps for the past month. Upon investigating further, the snmptt.log was over a month old.
We then restared the SNMPTT service and everything burst back to life but we were flooded with a months worth of alarms.
This is not the first time that this has happened and the SNMPTT service has locked up.
Is there any way of setting something up so that the service is restarted every couple of days or every week to ensure that this doesn't happen again.
I'm not sure as to why it locks up but we are losing critical alarming as a result.
Many Thanks
Re: SNMPTT service locked up
Posted: Tue May 03, 2016 9:49 am
by gormank
Have a look at the change to the nagios init script in the 4th post down in the link. I haven't seen the problem since makaing this change.
https://support.nagios.com/forum/viewto ... 8&start=30
A monitor on the number of files in (I think) /var/spool/snmptt will let you know when its locked up.
My experience what that a restart of the running process showed the stop part as failed, and then a success on start. I think a status check showed running.
You may want to wait for the Nagios folks to comment.
Re: SNMPTT service locked up
Posted: Tue May 03, 2016 1:12 pm
by tmcdonald
That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:
Code: Select all
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
touch $NagiosVarDir/nagios.log $NagiosRetentionFile
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
;;
to
Code: Select all
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
touch $NagiosVarDir/nagios.log $NagiosRetentionFile
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
/etc/init.d/snmptt restart
;;
Basically just adding the
/etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.
Re: SNMPTT service locked up
Posted: Wed May 04, 2016 3:50 am
by vijilants
Thank you,
Do I also need to replace snmptraphandling.py with the given file in that thread ?
Thanks
Re: SNMPTT service locked up
Posted: Wed May 04, 2016 9:07 am
by gormank
No
Re: SNMPTT service locked up
Posted: Wed May 04, 2016 1:33 pm
by rkennedy
Thanks @gormank!
@vijilants - let us know if you have any further questions.
Re: SNMPTT service locked up
Posted: Mon May 09, 2016 4:07 am
by vijilants
tmcdonald wrote:That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:
Code: Select all
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
touch $NagiosVarDir/nagios.log $NagiosRetentionFile
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
/etc/init.d/snmptt restart
;;
Basically just adding the
/etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.
Thank you. OK I've added /etc/init.d/snmptt restart to the /etc/init.d/nagios file at the point in the quote.
Do I need to restart any processes after this change ?
Many Thanks
Re: SNMPTT service locked up
Posted: Mon May 09, 2016 9:36 am
by tmcdonald
Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.
Re: SNMPTT service locked up
Posted: Thu May 12, 2016 2:05 am
by vijilants
tmcdonald wrote:Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.
Is there any way of me testing this.....eg doing a "service nagios restart" and monitoring a log to see if snmptt restarts ?
Many Thanks
Re: SNMPTT service locked up
Posted: Thu May 12, 2016 2:53 pm
by tmcdonald
You can look in
/var/log/snmptt/snmpttsystem.log for the following on a restart of SNMPTT:
Code: Select all
Thu May 12 14:41:20 2016 SNMPTT v1.4beta2 started
Thu May 12 14:41:20 2016 Loading /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Finished loading 966 lines from /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Changing to UID: snmptt (497)
Credit to
@tgriep for testing this!