SNMPTT service locked up

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

SNMPTT service locked up

Post by vijilants »

System:

Nagios XI Version : 5.2.3
CentOS release 6.5 (Final)

Hi,

Can you please advise.

Today we found that one of our Nagios systems had not been reporting on SNMP traps for the past month. Upon investigating further, the snmptt.log was over a month old.

We then restared the SNMPTT service and everything burst back to life but we were flooded with a months worth of alarms.

This is not the first time that this has happened and the SNMPTT service has locked up.

Is there any way of setting something up so that the service is restarted every couple of days or every week to ensure that this doesn't happen again.

I'm not sure as to why it locks up but we are losing critical alarming as a result.

Many Thanks
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: SNMPTT service locked up

Post by gormank »

Have a look at the change to the nagios init script in the 4th post down in the link. I haven't seen the problem since makaing this change.

https://support.nagios.com/forum/viewto ... 8&start=30

A monitor on the number of files in (I think) /var/spool/snmptt will let you know when its locked up.
My experience what that a restart of the running process showed the stop part as failed, and then a success on start. I think a status check showed running.
You may want to wait for the Nagios folks to comment.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: SNMPTT service locked up

Post by tmcdonald »

That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                ;;
to

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                /etc/init.d/snmptt restart
                ;;
Basically just adding the /etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.
Former Nagios employee
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

Re: SNMPTT service locked up

Post by vijilants »

Thank you,

Do I also need to replace snmptraphandling.py with the given file in that thread ?

Thanks
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: SNMPTT service locked up

Post by gormank »

No
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: SNMPTT service locked up

Post by rkennedy »

Thanks @gormank!

@vijilants - let us know if you have any further questions.
Former Nagios Employee
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

Re: SNMPTT service locked up

Post by vijilants »

tmcdonald wrote:That should work, but the line numbers are not matching what I have on a 5.2.7 system. Here is the section you would need to edit:

Code: Select all

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                /etc/init.d/snmptt restart
                ;;
Basically just adding the /etc/init.d/snmptt restart line at the end. I would save the init script just to be safe in case you need to revert.
Thank you. OK I've added /etc/init.d/snmptt restart to the /etc/init.d/nagios file at the point in the quote.

Do I need to restart any processes after this change ?

Many Thanks
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: SNMPTT service locked up

Post by tmcdonald »

Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.
Former Nagios employee
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

Re: SNMPTT service locked up

Post by vijilants »

tmcdonald wrote:Nope, but the restart of snmptt should now occur with a (re)start of the nagios process.
Is there any way of me testing this.....eg doing a "service nagios restart" and monitoring a log to see if snmptt restarts ?

Many Thanks
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: SNMPTT service locked up

Post by tmcdonald »

You can look in /var/log/snmptt/snmpttsystem.log for the following on a restart of SNMPTT:

Code: Select all

Thu May 12 14:41:20 2016 SNMPTT v1.4beta2 started
Thu May 12 14:41:20 2016 Loading /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Finished loading 966 lines from /etc/snmp/snmptt.conf
Thu May 12 14:41:20 2016 Changing to UID: snmptt (497)
Credit to @tgriep for testing this!
Former Nagios employee
Locked