Page 1 of 2

Nagios Core Startup Issue

Posted: Tue Mar 11, 2014 11:42 am
by xcomm
So I am trying to migrate our nagios core system from a freeBSD box to a CentOS system.
it is a version upgrade from 3.5.0 to 3.5.1.
I installed the CentOS nagios version using EPEL respository.
As a default install it worked showing the local host.
I then copy over and config files and check scripts. Updated anything that appeared to have a bad path/invalid path.
The nagios -v (config file path) check is passing but the service will not start.
___________________________________
Running pre-flight check on configuration data...

Checking services...
Checked 3391 services.
Checking hosts...
Checked 753 hosts.
Checking host groups...
Checked 391 host groups.
Checking service groups...
Checked 23 service groups.
Checking contacts...
Checked 8 contacts.
Checking contact groups...
Checked 6 contact groups.
Checking service escalations...
Checked 10173 service escalations.
Checking service dependencies...
Checked 1655 service dependencies.
Checking host escalations...
Checked 2259 host escalations.
Checking host dependencies...
Checked 0 host dependencies.
Checking commands...
Checked 53 commands.
Checking time periods...
Checked 4 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check
[hostname]# service nagios start
Starting nagios:CONFIG ERROR! Start aborted. Check your Nagios configuration.
_____________________________________________________________
No nagios or system logs are being generated so i cant find the issue. I think it has something to do with my config file because, if i remove them and replace with samples the service starts and runs.

Any help would be great thanks.

Re: Nagios Core Startup Issue

Posted: Tue Mar 11, 2014 12:24 pm
by technick
Usually that means there's an error in one of your nagios config files. I'll look directly at nagios.cfg personally...

Also you can run a strace on the binary and see if that might tell you more of whats going on.

Re: Nagios Core Startup Issue

Posted: Tue Mar 11, 2014 2:05 pm
by xcomm
Ok, running right off the binary pointing to the config file its starting. all thought not pulling services only host ICMP. (maybe broker config?)
Not sure why the init file is not working correctly.
Whats the best way to get the init file working again, or find out why its not working?

Re: Nagios Core Startup Issue

Posted: Tue Mar 11, 2014 4:55 pm
by sreinhardt
You probably want to, as you thought, look at the init file in /etc/init.d/nagios. You could also post it here for us to look at. Specifically I would say you need to look at what prefix and NagiosCfgFile are equal to, and make sure they are pointing at the right places.

Re: Nagios Core Startup Issue

Posted: Tue Mar 11, 2014 8:22 pm
by xcomm
I have it running now with out the init script.
Looking at the paths they all seem correct. Not sure why this keeps failing.

Code: Select all

#!/bin/sh
#
# chkconfig: - 99 01
# description: Nagios network monitor
#
# File : nagios
#
# Author : Jorge Sanchez Aymar ([email protected])
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <[email protected]>
#  - setup for autoconf
#  - add reload function
# 1999-08-06 Ethan Galstad <[email protected]>
#  - Added configuration info for use with RedHat's chkconfig tool
#    per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <[email protected]>
#  - added variable for nagios/var directory
#  - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <[email protected]>
#  - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <[email protected]>
#  - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <[email protected]>
#  - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
#              used to provide network services status.
#

# Load any extra environment variables for Nagios and its plugins
if test -f /etc/sysconfig/nagios; then
        . /etc/sysconfig/nagios
fi

status_nagios ()
{

        if test -x $NagiosCGI/daemonchk.cgi; then
                if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
                        return 0
                else
                        return 1
                fi
        else
                if ps -p $NagiosPID > /dev/null 2>&1; then
                        return 0
                else
                        return 1
                fi
        fi

        return 1
}


printstatus_nagios()
{
        status_nagios $1 $2
        RETVAL=$?
        if [ $RETVAL = 0 ]; then
                echo "nagios (pid $NagiosPID) is running..."
        else
                echo "nagios is not running"
        fi
        return $RETVAL
}


killproc_nagios ()
{

        kill $2 $NagiosPID

}


pid_nagios ()
{

        if test ! -f $NagiosRunFile; then
                echo "No lock file found in $NagiosRunFile"
                exit 1
        fi

        NagiosPID=`head -n 1 $NagiosRunFile`
}


# Source function library
# Solaris doesn't have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
        . /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
        . /etc/init.d/functions
fi

prefix=/usr/share/nagios
exec_prefix=/var/lib/nagios
NagiosBin=/usr/sbin/nagios
NagiosCfgFile=/etc/nagios/nagios.cfg
NagiosStatusFile=/var/log/nagios/status.dat
NagiosRetentionFile=/var/log/nagios/retention.dat
NagiosCommandFile=/var/log/nagios/rw/nagios.cmd
NagiosVarDir=/var/log/nagios
NagiosRunFile=/var/run/nagios.pid
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=/usr/sbin
NagiosUser=nagios
NagiosGroup=nagios


# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
    echo "Executable file $NagiosBin not found.  Exiting."
    exit 1
fi

# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
    echo "Configuration file $NagiosCfgFile not found.  Exiting."
    exit 1
fi

# See how we were called.
case "$1" in

        start)
                echo -n "Starting nagios:"
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                        chown $NagiosUser:$NagiosGroup $NagiosVarDir/nagios.log $NagiosRetentionFile
                        rm -f $NagiosCommandFile
                        touch $NagiosRunFile
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
                        if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
                        echo " done."
                        exit 0
                else
                        echo "CONFIG ERROR!  Start aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        stop)
                echo -n "Stopping nagios: "

                pid_nagios
                killproc_nagios nagios

                # now we have to wait for nagios to exit and remove its
                # own NagiosRunFile, otherwise a following "start" could
                # happen, and then the exiting nagios will remove the
                # new NagiosRunFile, allowing multiple nagios daemons
                # to (sooner or later) run - John Sellens
                #echo -n 'Waiting for nagios to exit .'
                for i in 1 2 3 4 5 6 7 8 9 10 ; do
                    if status_nagios > /dev/null; then
                        echo -n '.'
                        sleep 1
                    else
                        break
                    fi
                done
                if status_nagios > /dev/null; then
                    echo ''
                    echo 'Warning - nagios did not exit in a timely manner'
                else
                    echo 'done.'
                fi

                rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
                ;;

        status)
                pid_nagios
                printstatus_nagios nagios
                exit $?
                ;;

        checkconfig)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo " OK."
                else
                        echo " CONFIG ERROR!  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        restart)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo "done."
                        $0 stop
                        $0 start
                else
                        echo " CONFIG ERROR!  Restart aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        reload|force-reload)
                printf "Running configuration check..."
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        echo "done."
                        if test ! -f $NagiosRunFile; then
                                $0 start
                        else
                                pid_nagios
                                if status_nagios > /dev/null; then
                                        printf "Reloading nagios configuration..."
                                        killproc_nagios nagios -HUP
                                        echo "done"
                                else
                                        $0 stop
                                        $0 start
                                fi
                        fi
                else
                        echo " CONFIG ERROR!  Reload aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

        *)
                echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig}"
                exit 2
                ;;

esac

# End of this script

Re: Nagios Core Startup Issue

Posted: Wed Mar 12, 2014 12:48 pm
by sreinhardt
Looks like you are probably having an issue here:

Code: Select all

                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
Could you run a verification and then run echo $? like so:

Code: Select all

/usr/sbin/nagios -v /etc/nagios/nagios.cfg
echo $?

Re: Nagios Core Startup Issue

Posted: Wed Mar 12, 2014 4:16 pm
by xcomm
Thats where I thought it was dying as I get the echo of :
Starting nagios:CONFIG ERROR! Start aborted. Check your Nagios configuration

But after running nagios -v (config path)
echo $? returns 0

So I wonder where its going bad.

Re: Nagios Core Startup Issue

Posted: Wed Mar 12, 2014 4:27 pm
by xcomm
adding the following from init.
NagiosBin=/usr/sbin/nagios
NagiosCfgFile=/etc/nagios/nagios.cfg

nagiosbin appears correct and every one should be able to execute
-rwxr-xr-x. 1 root root 652384 Aug 30 2013 nagios

config file is at /etc/nagios/nagios.cfg

Its even passing in the first two checks on start where it looks for that. Yet the returned string is
Starting nagios:CONFIG ERROR! Start aborted. Check your Nagios configuration.
suggesting that $? -eq 0 is something other then 0

Re: Nagios Core Startup Issue

Posted: Wed Mar 12, 2014 4:54 pm
by sreinhardt
Well, let's verify exactly what we are getting then. Try altering the init script to have this portion instead for start():

Code: Select all

$NagiosBin -v $NagiosCfgFile ; # removed redirection
                if [ $? -eq 0 ]; then
                        echo "Passed test - exit $?" # Added this
                        touch $NagiosVarDir/nagios.log $NagiosRetentionFile
                        chown $NagiosUser:$NagiosGroup $NagiosVarDir/nagios.log $NagiosRetentionFile
                        rm -f $NagiosCommandFile
                        touch $NagiosRunFile
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        [ -x /sbin/restorecon ] && /sbin/restorecon $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
                        if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
                        echo " done."
                        exit 0
                else
                        echo "Failed test - exit $?" # Added this
                        echo "CONFIG ERROR!  Start aborted.  Check your Nagios configuration."
                        exit 1
                fi
                ;;

Re: Nagios Core Startup Issue

Posted: Wed Mar 12, 2014 5:11 pm
by xcomm
OK, getting some where.
Error: Cannot open resource file '/etc/nagios/private/resource.cfg' for reading!
Read main config file okay...
Processing object config directory '/etc/nagios/objects'...
Error: Could not open config directory '/etc/nagios/objects' for reading.
Error processing object config files!


***> One or more problems was encountered while processing the config files...

Check your configuration file(s) to ensure that they contain valid
directives and data defintions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
'Whats New' section to find out what has changed.

Failed test - exit 1
CONFIG ERROR! Start aborted. Check your Nagios configuration.
---
Files and folders below to nagios and the nagios group/apache member of group. owner has rw over all and group has r some files are default r also

What should these be? do they all need X?