SNMPTT service locked up

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

Re: SNMPTT service locked up

Post by vijilants »

I'm still having issues with this.....today we noticed no alarms since 10th July and had to restart SNMPTT and they all flooded the system and emails were flooded out all over the place.

When you do a service snmptt restart it comes up as fail to stop service and OK on service start.

Is there any way to do someting to check the status or restart this service say daily ?
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: SNMPTT service locked up

Post by gormank »

See my suggestion in post 2 of this thread on May 3 to monitor the number of files in /var/spool/snmptt...
Have you updated Nagios since the change to the init script? Maybe it was overwritten.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: SNMPTT service locked up

Post by ssax »

Good point gormank! vijilants, please validate that it's still in the init script.

You could setup a service check that monitors the directory/etc and then use an event handler to restart it OR you could setup a cron job and just restart it every day, but first check the init script.


Thank you
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: SNMPTT service locked up

Post by Box293 »

gormank wrote:See my suggestion in post 2 of this thread on May 3 to monitor the number of files in /var/spool/snmptt...
Have you updated Nagios since the change to the init script? Maybe it was overwritten.
Here are the steps to do this:

https://support.nagios.com/kb/article.php?id=502
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: SNMPTT service locked up

Post by gormank »

service snmptt status shows running, while a restart shows failed, then started so monitoring the service will likely give a false sense of security. I monitor snmptt.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: SNMPTT service locked up

Post by ssax »

vijilants, was it still in the init script?
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: SNMPTT service locked up

Post by gormank »

Could someone add a feature request to say check for the snmptt PID file (/var/run/snmptt.pid on RHEL) and restart if it exists?
This is a known issue that has existed for years. Why continue to pretend it doesn't exist?

I'm also curious as to why no one seems to want to monitor the number of files spooled but I guess I'm just dense...
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: SNMPTT service locked up

Post by ssax »

I've created a bug report for this with a link back to this thread with a bunch of notes on it:

Code: Select all

NEW TASK ID 9234 created - Nagios XI Bug Report: SNMPTT locks up after applying config
What I personally think is happening is that snmptt is trying to run the EXEC statement to put the trap into XI, it tries to write to the command file and can't because it's been deleted as a result of the nagios restart (from the apply config), and then it causes a hang/issue with SNMPTT. It's the only thing I can think of.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: SNMPTT service locked up

Post by Box293 »

gormank wrote:I'm also curious as to why no one seems to want to monitor the number of files spooled but I guess I'm just dense...
This KB article I provided a link for on Wednesday has steps for doing that:

https://support.nagios.com/kb/article.php?id=502
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
vijilants
Posts: 215
Joined: Wed Jun 12, 2013 2:50 pm

Re: SNMPTT service locked up

Post by vijilants »

ssax wrote:vijilants, was it still in the init script?
Yes it was still in there....

Code: Select all

# See how we were called.
case "$1" in

        start)
                echo -n "Starting nagios:"

                if test "$checkconfig" = "true"; then
                        check_config
                        # check_config exits on configuration errors.
                fi

                if test -f $NagiosRunFile; then
                        NagiosPID=`head -n 1 $NagiosRunFile`
                        if status_nagios; then
                                echo " another instance of nagios is already running."
                                exit 0
                        fi
                fi

                touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                rm -f $NagiosCommandFile
                touch $NagiosRunFile
                chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
                USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
                if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

                echo " done."
                /etc/init.d/snmptt restart
                ;;

        stop)
                echo -n "Stopping nagios:"
Can someone please advise on a quick fix for this in detail please on how to get this checked and restart it if it fails ? We simply do not have the time to be monitoring the box manually.

This has become a real pain and it it highly annoying to people on the mailing list to be flooded with historic alarms.

Thanks
Locked