Page 1 of 3

SNMPTT service locked up

Posted: Tue May 03, 2016 6:08 am
by vijilants
System:

Nagios XI Version : 5.2.3
CentOS release 6.5 (Final)

Hi,

Can you please advise.

Today we found that one of our Nagios systems had not been reporting on SNMP traps for the past month. Upon investigating further, the snmptt.log was over a month old.

We then restared the SNMPTT service and everything burst back to life but we were flooded with a months worth of alarms.

This is not the first time that this has happened and the SNMPTT service has locked up.

Is there any way of setting something up so that the service is restarted every couple of days or every week to ensure that this doesn't happen again.

I'm not sure as to why it locks up but we are losing critical alarming as a result.

Many Thanks

Re: SNMPTT service locked up

Posted: Tue May 03, 2016 9:49 am
by gormank
Have a look at the change to the nagios init script in the 4th post down in the link. I haven't seen the problem since makaing this change.

https://support.nagios.com/forum/viewto ... 8&start=30

A monitor on the number of files in (I think) /var/spool/snmptt will let you know when its locked up.
My experience what that a restart of the running process showed the stop part as failed, and then a success on start. I think a status check showed running.
You may want to wait for the Nagios folks to comment.

Re: SNMPTT service locked up

Posted: Tue May 03, 2016 1:12 pm
by tmcdonald
That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                ;;
to

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                /etc/init.d/snmptt restart
                ;;
Basically just adding the /etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.

Re: SNMPTT service locked up

Posted: Wed May 04, 2016 3:50 am
by vijilants
Thank you,

Do I also need to replace snmptraphandling.py with the given file in that thread ?

Thanks

Re: SNMPTT service locked up

Posted: Wed May 04, 2016 9:07 am
by gormank
No

Re: SNMPTT service locked up

Posted: Wed May 04, 2016 1:33 pm
by rkennedy
Thanks @gormank!

@vijilants - let us know if you have any further questions.

Re: SNMPTT service locked up

Posted: Mon May 09, 2016 4:07 am
by vijilants
tmcdonald wrote:That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                /etc/init.d/snmptt restart
                ;;
Basically just adding the /etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.
Thank you. OK I've added /etc/init.d/snmptt restart to the /etc/init.d/nagios file at the point in the quote.

Do I need to restart any processes after this change ?

Many Thanks

Re: SNMPTT service locked up

Posted: Mon May 09, 2016 9:36 am
by tmcdonald
Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.

Re: SNMPTT service locked up

Posted: Thu May 12, 2016 2:05 am
by vijilants
tmcdonald wrote:Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.
Is there any way of me testing this.....eg doing a "service nagios restart" and monitoring a log to see if snmptt restarts ?

Many Thanks

Re: SNMPTT service locked up

Posted: Thu May 12, 2016 2:53 pm
by tmcdonald
You can look in /var/log/snmptt/snmpttsystem.log for the following on a restart of SNMPTT:

Code: Select all

Thu May 12 14:41:20 2016 SNMPTT v1.4beta2 started
Thu May 12 14:41:20 2016 Loading /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Finished loading 966 lines from /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Changing to UID: snmptt (497)
Credit to @tgriep for testing this!