Nagios XI 2014R1.1 No more SNMP Traps

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

Hi All,
I just upgrade my version of Nagios XI to 2014R1.1 and all of it went smooth but I noticed I was no longer receiving SNMP traps under /usr/local/nagios/var/nagios.log

I can see the traps under NSTI 1.4 and under /var/log/snmptt/snmptt.log and under /var/log/messages but nothing under /usr/local/nagios/var/nagios.log

I re-ran the following:
Integrating_SNMP_Traps_With_Nagios_XI.pdf

Code: Select all

cd /tmp
wget
http://assets.nagios.com/downloads/nagiosxi/scripts/NagiosXI-SNMPTrap-setup.sh
sh ./NagiosXI-SNMPTrap-setup.sh
/etc/snmp/snmptt.ini (FINE)
/etc/snmp/snmptrapd.conf (FINE)
/etc/sysconfig/snmptrapd (FINE)
/etc/init.d/snmptrapd (FINE)
/etc/init.d/snmptt (FINE)


Example of Messages:
/var/log/messages
Jun 4 18:33:15 NagiosXI snmptt[20009]: .1.3.6.1.4.1.x.x.3.19 Warning "Status Events" x.x.x.11 - 21581 GPUTemp2=83, Out of range: min=0,max=75.

Example of SNMPTT:
/var/log/messages/snmptt
Wed Jun 4 18:33:11 2014 .1.3.6.1.4.1.x.x.3.19 Warning "Status Events" x.x.x.11 - 21581 GPUTemp2=83, Out of range: min=0,max=75.

Any help would be greatly appreciated.
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

More troubleshooting:
Changed the following:
/usr/local/nagios/etc/nagios.cfg
Changed log_event_handlers=0 to log_event_handlers=1

Did a sweep of the config and all fine as shown below

Code: Select all

[root@NagiosXI etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.0.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 04-29-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 50838 services.
        Checked 687 hosts.
        Checked 2 host groups.
        Checked 75 service groups.
        Checked 9 contacts.
        Checked 4 contact groups.
        Checked 115 commands.
        Checked 15 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 687 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 15 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
[root@NagiosXI etc]#
========================================================


Found this error when bringing up /usr/local/nagios/var/nagios.log:

Code: Select all

Jun  4 22:58:36 NagiosXI nagios: Error: Could not create external command file '/usr/local/nagios/var/rw/nagios.cmd' as named pipe: (17) -> File exists.  If this file already exists and you are sure that another copy of Nagios is not running, you should delete this file.
Did the following to try repair it:

Code: Select all

[root@NagiosXI etc]# ll /usr/local/nagios/var/rw/nagios.cmd
-rw-rw-r-- 1 root nagcmd 726 Jun  4 23:03 /usr/local/nagios/var/rw/nagios.cmd
[root@NagiosXI etc]#
- Changed permissions to nagios:nagcmd of /usr/local/nagios/var/rw/nagios.cmd but after nagios service restart it reverts back to root:nagcmd


Did this to fix it:

Code: Select all

/etc/init.d/nagios stop
cd /usr/local/nagios/var/rw/
ls
rm -rf nagios.cmd
mkfifo nagios.cmd
chown nagios:nagcmd nagios.cmd
ll
chmod 660 nagios.cmd
service nagios start
Permissions fixed it seems as shown below:

Code: Select all

prw-rw---- 1 nagios nagcmd 0 Jun  4 23:22 nagios.cmd
I restarted snmptt and now SNMP are now flowing :), I am a happy camper.



I still have an underlying issue as when I apply all my config changes and nagios restarts nagios.cmd revert back to the root:nagcmd.


Temporarily I made this script to help me out.

Code: Select all

#!/bin/bash
/etc/init.d/nagios stop;
rm -rf /usr/local/nagios/var/rw/nagios.cmd;
mkfifo /usr/local/nagios/var/rw/nagios.cmd;
chown nagios:nagcmd /usr/local/nagios/var/rw/nagios.cmd;
chmod 660 /usr/local/nagios/var/rw/nagios.cmd;
service nagios start;
service snmptt restart;

I am also seeing the following process:

Code: Select all

[root@NagiosXI rw]# ps -aux |grep '<defunct>'
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
nagios    7699  0.1  0.0      0     0 ?        Z    00:28   0:00 [nagios] <defunct>
BTW when I installed the new Nagios 2014R1.1 yesterday the only thing I did is download the tarball and ran ./upgrade that's all.

Should I have done this before ? ./configure --with-nagios-user=nagios --with-nagios-group=nagios
If this is the case what can I do now that it's already installed :/

If anyone could give me a permanent fix it would be greatly appreciated.

Michael
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by sreinhardt »

You shouldn't ever have to modify the core compilation from an XI fullinstall or upgrade. I do agree its a bit strange that the cmd file is being created as root:nagcmd, it would seem to me that this is created before permissions have been dropped to nagios:nagios or nagios:nagcmd, which I think could be argued to be incorrect. We will have to look into this though as I think this is the first notification of it.
Regarding your traps, snmptt user should actually be added to the nagios and nagcmd groups, so the permissions shouldn't have caused an issue unless group did not have write(might have missed that if you mentioned it). Could you check your /etc/group file and be sure:

Code: Select all

grep 'snmptt' /etc/group
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

Hi Spenser,
Below are the results:

Code: Select all

[root@NagiosXI ~]# grep 'snmptt' /etc/group
nagios:x:501:nagios,apache,snmptt
nagcmd:x:502:nagios,apache,snmptt
snmptt:x:492:

- Michael
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by scottwilkerson »

Could you show the following, I think you may have a directory permission problem

Code: Select all

ls -ld /usr/local/nagios/var/rw /usr/local/nagios/var /usr/local/nagios
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

Hi Scott,
Here are the results:

Code: Select all

[root@NagiosXI ~]# ls -ld /usr/local/nagios/var/rw /usr/local/nagios/var /usr/local/nagios
drwxr-xr-x. 9 nagios nagios 4096 Jun  4 17:23 /usr/local/nagios
drwxrwxr-x. 6 nagios nagios 4096 Jun  5 17:11 /usr/local/nagios/var
drwxrwsr-x. 2 nagios nagcmd 4096 Jun  5 03:16 /usr/local/nagios/var/rw
[root@NagiosXI ~]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by scottwilkerson »

looking at this post
http://support.nagios.com/forum/viewtop ... 13#p100839

You are making the nagios.cmd instead of letting the nagios process create it as it should on startup.

If you stop nagios

Code: Select all

service nagios stop
make sure the pipe doesn't exist

Code: Select all

rm -rf /usr/local/nagios/var/rw/nagios.cmd
and start nagios

Code: Select all

service nagios start
It should create the pipe with the appropriate permissions, and when nagios is stopped it will tear down the pipe again, as it should.

One other thing I noted is that you have a . after your permissions which usually means selinux is enabled and this can definately cause issues with these programs talking with one another.

check with

Code: Select all

getenforce
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

Hi Scott,
Below are my results for "getenforce"

Code: Select all

[root@NagiosXI ~]# getenforce
Disabled
[root@NagiosXI ~]#
also checked with this command

Code: Select all

[root@NagiosXI var]# sestatus
SELinux status:                 disabled
[root@NagiosXI var]#

As per your first question:
Scott: <quote>
If you stop nagios
Code: Select all
service nagios stop
make sure the pipe doesn't exist
Code: Select all
rm -rf /usr/local/nagios/var/rw/nagios.cmd
and start nagios
Code: Select all
service nagios start
</quote>

What I found scott is that when I stop nagios with "service nagios stop" nagios.cmd is not deleted, the other weird thing I noticed is:

Code: Select all

nagios   26486  0.0  0.0  10012  1036 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   26487  0.0  0.0  10012  1036 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   26488  0.0  0.0  10012  1040 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   26489  0.0  0.0  10012  1036 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   26490  0.0  0.0  10012  1036 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   26491  0.0  0.0  10012  1036 ?        S    Jun05   0:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
The workers never die and the nagios.cmd never gets deleted on it's own which I thought it was permission based so I "su nagios" and wrote a touched a test file under /usr/local/nagios/var/rw and that was created with no issues.

I'm at a loss now as everytime I reboot or apply a change I'm going to manually need to create the file as every time the file gets created it's the wrong permissions "nagios nagcmd".

- Michael
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by sreinhardt »

This is very strange. Could you post your /etc/init.d/nagios file please?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mlopez
Posts: 62
Joined: Fri Oct 19, 2012 11:35 am

Re: Nagios XI 2014R1.1 No more SNMP Traps

Post by mlopez »

Hi Spenser,
Here is my /etc/init.d/nagios file

Code: Select all

[root@NagiosXI nsti]# cat /etc/init.d/nagios
#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
#
# File : nagios
#
# Author : Jorge Sanchez Aymar ([email protected])
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <[email protected]>
#  - setup for autoconf
#  - add reload function
# 1999-08-06 Ethan Galstad <[email protected]>
#  - Added configuration info for use with RedHat's chkconfig tool
#    per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <[email protected]>
#  - added variable for nagios/var directory
#  - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <[email protected]>
#  - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <[email protected]>
#  - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <[email protected]>
#  - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
#              used to provide network services status.
#
#RAMDISK ADDITION MIKE
mkdir -p -m 775 /var/nagiosramdisk
mkdir -p -m 775 /var/nagiosramdisk/tmp
mkdir -p -m 775 /var/nagiosramdisk/spool
mkdir -p -m 775 /var/nagiosramdisk/spool/checkresults
chown -R nagios.nagios /var/nagiosramdisk
#RAMDISK ADDITION END


status_nagios ()
{

        if test -x $NagiosCGI/daemonchk.cgi; then
                if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
                        return 0
                else
                        return 1
                fi
        else
                if ps -p $NagiosPID > /dev/null 2>&1; then
                        return 0
                else
                        return 1
                fi
        fi

        return 1
}


printstatus_nagios()
{

        if status_nagios $1 $2; then
                echo "nagios (pid $NagiosPID) is running..."
        else
                echo "nagios is not running"
        fi
}


killproc_nagios ()
{

        kill $2 $NagiosPID

}


pid_nagios ()
{

        if test ! -f $NagiosRunFile; then
                echo "No lock file found in $NagiosRunFile"
                exit 1
        fi

        NagiosPID=`head -n 1 $NagiosRunFile`
}


# Source function library
# Solaris doesn't have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
        . /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
        . /etc/init.d/functions
fi

prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=${prefix}/var/nagios.lock
#NagiosLockDir=/var/lock/subsys
NagiosLockDir=/usr/local/nagiosxi/var/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios


# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
    echo "Executable file $NagiosBin not found.  Exiting."
    exit 1
fi

# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
    echo "Configuration file $NagiosCfgFile not found.  Exiting."
    exit 1
fi

# See how we were called.
case "$1" in

        start)
                echo -n "Starting nagios:"
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
# THESE TWO LINES WERE ADDED TO WORK WITH SUDO
                        touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                        chown $NagiosUser $NagiosVarDir/nagios.log $NagiosRetentionFile
#                       su - $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
                        rm -f $NagiosCommandFile
                        touch $NagiosRunFile
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
                        if [ -d $NagiosLockDir ]; then
                            touch $NagiosLockDir/$NagiosLockFile;
                            chown $NagiosUser:$NagiosGroup $NagiosLockDir/$NagiosLockFile;
                        fi
                        echo " done."
                        exit 0
                else
                        echo "CONFIG ERROR!  Start aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        stop)
                echo -n "Stopping nagios: "

                pid_nagios
                killproc_nagios nagios

                # now we have to wait for nagios to exit and remove its
                # own NagiosRunFile, otherwise a following "start" could
                # happen, and then the exiting nagios will remove the
                # new NagiosRunFile, allowing multiple nagios daemons
                # to (sooner or later) run - John Sellens
                #echo -n 'Waiting for nagios to exit .'
                for i in 1 2 3 4 5 6 7 8 9 10 ; do
                    if status_nagios > /dev/null; then
                        echo -n '.'
                        sleep 1
                    else
                        break
                    fi
                done

                if status_nagios > /dev/null; then
                    echo ''
                    echo 'Warning - nagios did not exit in a timely manner'
                else

                    # Forcefully kill all other nagios processes that might be running, so we don't end up with a wierd setup

                    # Get a list of PIDs for all running Nagios daemons
                    plist=`ps axuw | grep "/usr/local/nagios/bin/nagios -d"  | awk '{print $2}'`
                    #echo "PIDS"
                    #echo $plist
                    for pid in $plist; do
                        #echo "KILL $pid"
                        kill -9 $pid > /dev/null 2>&1
                    done

                    echo 'done.'
                fi

                rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
                ;;

        status)
                pid_nagios
                printstatus_nagios nagios
                ;;

        checkconfig)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo " OK."
                else
                        echo " CONFIG ERROR!  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        restart)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo "done."
                        $0 stop
                        $0 start
                else
                        echo " CONFIG ERROR!  Restart aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        reload|force-reload)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo "done."
                        if test ! -f $NagiosRunFile; then
                                $0 start
                        else
                                pid_nagios
                                if status_nagios > /dev/null; then
                                        printf "Reloading nagios configuration..."
                                        killproc_nagios nagios -HUP
                                        echo "done"
                                else
                                        $0 stop
                                        $0 start
                                fi
                        fi
                else
                        echo " CONFIG ERROR!  Reload aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        *)
                echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig}"
                exit 1
                ;;

esac

# End of this script
[root@NagiosXI nsti]#

Locked