Cannot start Nagios 4.0.2 on openSUSE 13.1

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
mistie710
Posts: 4
Joined: Sat Dec 28, 2013 9:04 am

Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by mistie710 »

Actually this all comes from another thread which, unfortunately, I didn't keep a record of so I can't find what happened to it. :oops:

I am testing a new build which will use Nagios 4.0.2 and NagiosQL 3.2 on openSUSE 13.1. So far everything is working as it should except for one specific thing - starting Nagios. As you may be aware if you have used openSUSE recently, they switched to systemd from version 12.1 onwards and the sysvinit startup used by Nagios no longer seems to work. As I mentioned before, I did find a thread on here that suggested an alternative that does appear to work on 13.1 as long as you make a couple of adjustments.

/etc/init.d/nagios

Code: Select all

# Nagios   Startup script for the Nagios monitoring daemon
#
# chkconfig:   - 85 15
# description:   Nagios is a service monitoring system
# processname: nagios
# config: /etc/nagios/nagios.cfg
# pidfile: /var/nagios/nagios.pid
#
### BEGIN INIT INFO
# Provides:      nagios
# Required-Start:   $local_fs $syslog $network
# Required-Stop:   $local_fs $syslog $network
# Short-Description:   start and stop Nagios monitoring server
# Description:      Nagios is is a service monitoring system
#                   This is a patched version of the startup as the
#                   core has been ******* by the developers.
### END INIT INFO

# Source function library.
# . /etc/rc.d/init.d/functions
. /lib/lsb/init-functions

prefix="/usr/local/nagios"
exec_prefix="${prefix}"
exec="${exec_prefix}/bin/nagios"
prog="nagios"
config="${prefix}/etc/nagios.cfg"
pidfile="${prefix}/var/nagios.lock"
user="nagios"
group="nagios"
checkconfig="false"
ramdiskdir="/var/nagios/ramcache"

test -e /etc/sysconfig/$prog && . /etc/sysconfig/$prog

lockfile=/var/lock/$prog
USE_RAMDISK=${USE_RAMDISK:-0}

if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
   ramdisk=`mount |grep "$ramdiskdir type tmpfs"`
   if [ "$ramdisk"X == "X" ]; then
      mkdir -p -m 0755 $ramdiskdir
      mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs $ramdiskdir
      mkdir -p -m 0755 $ramdiskdir/checkresults
      chown -R $user:$group $ramdiskdir
   fi
fi

check_config() {
   TMPFILE="/tmp/.configtest.$$"
   /usr/sbin/service nagios configtest > "$TMPFILE"
   WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
   ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`

   if test "$WARN" = "0" && test "${ERR}" = "0"; then
      echo "OK - Configuration check verified" > /var/run/nagios.configtest
      chmod 0644 /var/run/nagios.configtest
      /bin/rm "$TMPFILE"
   return 0
   else
      # We'll write out the errors to a file we can have a
      # script watching for
      echo "WARNING: Errors in config files - see log for details: $TMPFILE" > /var/run/nagios.configtest
      egrep -i "(^warning|^error)" "$TMPFILE" >> /var/run/nagios.configtest
      chmod 0644 /var/run/nagios.configtest
      cat "$TMPFILE"
   exit 8
   fi
}

start() {
   echo "Start option selected"
   echo "prog var = "$prog
   test -x $exec || exit 5
   test -f $config || exit 6
   if test "$checkconfig" = "false"; then
      check_config
   fi
   echo -n $"Starting $prog: "
   # We need to _make sure_ the precache is there and verified
   # Raise priority to make it run better
   startproc -u $user -- $exec -d $config
   #touch $lockfile
   retval=$?
   echo
   test $retval -eq 0 && touch $lockfile
   return $retval
}

stop() {
   echo -n $"Stopping $prog: "
   killproc -p ${pidfile}  $exec
   retval=$?
   echo
   test $retval -eq 0 && rm -f $lockfile
   return $retval
}


restart() {
   check_config
   checkconfig="true"
   stop
   start
}

reload() {
   echo -n $"Reloading $prog: "
   killproc -p ${pidfile} $exec -HUP
   RETVAL=$?
   echo
}

force_reload() {
   restart
}

case "$1" in
   start)
      checkproc $prog && exit 0
      $1
      ;;
   stop)
      checkproc $prog|| exit 0
      $1
      ;;
   restart)
      $1
      ;;
   reload)
      checkproc $prog || exit 7
      $1
      ;;
   force-reload)
      force_reload
      ;;
   status)
      checkproc $prog
      ;;
   condrestart|try-restart)
      checkproc $prog|| exit 0
      restart
      ;;
   configtest)
      $nice su -s /bin/bash - nagios -c "$corelimit >/dev/null 2>&1 ; $exec -vp $config"
      RETVAL=$?
      ;;
   *)
      echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload|configtest}"
      exit 2
esac
exit $?

The only differences were that I replaced the "daemon" call at line 82 with the equivalent "startproc" call and changed the "status_of_proc" call in each case from line 120 down with "checkproc". The problem now appears to be that systemd is getting in the way of everything in that the script is no longer being called, systemctl taking control instead. When that happens, it becomes impossible to start or stop (or do anything else) with Nagios as systemctl insists that the service is down, irrespective of what the actual state of the daemon is.

Is there another script that needs to be found to amend this? Or is there a way to stop systemctl bullying the init.d script out of the way? Or am I missing something here? That is besides the post I originally tried to post to, of course! :oops:

Ah, found the edit option!!!

Anyway, I've been working away on this and seem to have found a solution of sorts. In an earlier version of openSUSE, there were a couple of templates in the /etc/init.d directory name "skeleton" and "skeleton.compat". I took the latter of these across to openSUSE 13.1 and edited it to give the code below:

Code: Select all

#!/bin/bash
#
#     LSB system startup script for Nagios 4.0.2
#     Based on LSB template Copyright (C) 1995--2005  Kurt Garloff,
#     SUSE / Novell Inc., set up by Chris Johnson (Chika)
#
#     This library is free software; you can redistribute it and/or modify it
#     under the terms of the GNU Lesser General Public License as published by
#     the Free Software Foundation; either version 2.1 of the License, or (at
#     your option) any later version.
#
#     This library is distributed in the hope that it will be useful, but
#     WITHOUT ANY WARRANTY; without even the implied warranty of
#     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
#     Lesser General Public License for more details.
#
#     You should have received a copy of the GNU Lesser General Public
#     License along with this library; if not, write to the Free Software
#     Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
#
# /etc/init.d/nagios
# LSB compatible service control script; see http://www.linuxbase.org/spec/
# Please send feedback to http://www.suse.de/feedback/
#
# Note: This template uses functions rc_XXX defined in /etc/rc.status on
# UnitedLinux/SUSE/Novell based Linux distributions. However, it will work
# on other distributions as well, by using the LSB (Linux Standard Base)
# or RH functions or by open coding the needed functions.
# Read http://www.tldp.org/HOWTO/HighQuality-Apps-HOWTO/ if you prefer not
# to use this template.
#
# chkconfig: 345 99 00
# description: Nagios Nagios daemon providing system monitor
#
### BEGIN INIT INFO
# Provides:          Nagios
# Required-Start:    $syslog $remote_fs $time
# Should-Start:      $time ypbind smtp
# Required-Stop:     $syslog $remote_fs
# Should-Stop:       ypbind smtp
# Default-Start:     3 5
# Default-Stop:      0 1 2 6
# Short-Description: Nagios Nagios daemon providing system and net monitoring
# Description:       Start Nagios to allow scanning of network and the systems
#       that depend on it, providing a cgi website that is available
#       to users and provide alerts via email and SMS of problems
### END INIT INFO
#
# Any extensions to the keywords given above should be preceeded by
# X-VendorTag- according to LSB.
#
# Some of the comments have been removed for space. Should you wish to see
# them, please refer to the original "/etc/init.d/skeleton" scripts.

# Check for missing binaries (stale symlinks should not happen)
# Note: Special treatment of stop for LSB conformance
FOO_BIN=/usr/local/nagios/bin/nagios
test -x $FOO_BIN || { echo "$FOO_BIN not installed";
        if [ "$1" = "stop" ]; then exit 0;
        else exit 5; fi; }

# Check for existence of needed config file and read it
FOO_CONFIG=/etc/sysconfig/nagios
test -r $FOO_CONFIG || { echo "$FOO_CONFIG not existing";
        if [ "$1" = "stop" ]; then exit 0;
        else exit 6; fi; }

# Read config
. $FOO_CONFIG

# Some variables
nagios_config=/usr/local/nagios/etc/nagios.cfg
FOO_PIDFILE=/var/nagios/nagios.pid

# Source LSB init functions
# providing start_daemon, killproc, pidofproc,
# log_success_msg, log_failure_msg and log_warning_msg.
# This is currently not used by UnitedLinux based distributions and
# not needed for init scripts for UnitedLinux only. If it is used,
# the functions from rc.status should not be sourced or used.
#. /lib/lsb/init-functions

# Shell functions sourced from /etc/rc.status:
#      rc_check         check and set local and overall rc status
#      rc_status        check and set local and overall rc status
#      rc_status -v     be verbose in local rc status and clear it afterwards
#      rc_status -v -r  ditto and clear both the local and overall rc status
#      rc_status -s     display "skipped" and exit with status 3
#      rc_status -u     display "unused" and exit with status 3
#      rc_failed        set local and overall rc status to failed
#      rc_failed <num>  set local and overall rc status to <num>
#      rc_reset         clear both the local and overall rc status
#      rc_exit          exit appropriate to overall rc status
#      rc_active        checks whether a service is activated by symlinks

# Use the SUSE rc_ init script functions;
# emulate them on LSB, RH and other systems

# Default: Assume sysvinit binaries exist
start_daemon() { /sbin/start_daemon ${1+"$@"}; }
killproc()     { /sbin/killproc     ${1+"$@"}; }
pidofproc()    { /sbin/pidofproc    ${1+"$@"}; }
checkproc()    { /sbin/checkproc    ${1+"$@"}; }
if test -e /etc/rc.status; then
    # SUSE rc script library
    . /etc/rc.status
else
    export LC_ALL=POSIX
    _cmd=$1
    declare -a _SMSG
    if test "${_cmd}" = "status"; then
        _SMSG=(running dead dead unused unknown reserved)
        _RC_UNUSED=3
    else
        _SMSG=(done failed failed missed failed skipped unused failed failed reserved)
        _RC_UNUSED=6
    fi
    if test -e /lib/lsb/init-functions; then
        # LSB
        . /lib/lsb/init-functions
        echo_rc()
        {
            if test ${_RC_RV} = 0; then
                log_success_msg "  [${_SMSG[${_RC_RV}]}] "
            else
                log_failure_msg "  [${_SMSG[${_RC_RV}]}] "
            fi
        }
        # TODO: Add checking for lockfiles
        checkproc() { pidofproc ${1+"$@"} >/dev/null 2>&1; return $?; }
    elif test -e /etc/init.d/functions; then
        # RHAT
        . /etc/init.d/functions
        echo_rc()
        {
            #echo -n "  [${_SMSG[${_RC_RV}]}] "
            if test ${_RC_RV} = 0; then
                success "  [${_SMSG[${_RC_RV}]}] "
            else
                failure "  [${_SMSG[${_RC_RV}]}] "
            fi
        }
        checkproc() { status ${1+"$@"}; return $?; }
        start_daemon() { daemon ${1+"$@"}; return $?; }
    else
        # emulate it
        echo_rc() { echo "  [${_SMSG[${_RC_RV}]}] "; }
    fi
    rc_reset() { _RC_RV=0; }
    rc_failed()
    {
        if test -z "$1"; then
            _RC_RV=1;
        elif test "$1" != "0"; then
            _RC_RV=$1;
        fi
        return ${_RC_RV}
    }
    rc_check()
    {
        return rc_failed $?
    }
    rc_status()
    {
        rc_failed $?
        if test "$1" = "-r"; then _RC_RV=0; shift; fi
        if test "$1" = "-s"; then rc_failed 5; echo_rc; rc_failed 3; shift; fi
        if test "$1" = "-u"; then rc_failed ${_RC_UNUSED}; echo_rc; rc_failed 3; shift; fi
        if test "$1" = "-v"; then echo_rc; shift; fi
        if test "$1" = "-r"; then _RC_RV=0; shift; fi
        return ${_RC_RV}
    }
    rc_exit() { exit ${_RC_RV}; }
    rc_active()
    {
        if test -z "$RUNLEVEL"; then read RUNLEVEL REST < <(/sbin/runlevel); fi
        if test -e /etc/init.d/S[0-9][0-9]${1}; then return 0; fi
        return 1
    }
fi

# Reset status of this service
rc_reset

# Return values acc. to LSB for all commands but status:
# 0       - success
# 1       - generic or unspecified error
# 2       - invalid or excess argument(s)
# 3       - unimplemented feature (e.g. "reload")
# 4       - user had insufficient privileges
# 5       - program is not installed
# 6       - program is not configured
# 7       - program is not running
# 8--199  - reserved (8--99 LSB, 100--149 distrib, 150--199 appl)
#
# Note that starting an already running service, stopping
# or restarting a not-running service as well as the restart
# with force-reload (in case signaling is not supported) are
# considered a success.

case "$1" in
    start)
        echo -n "Starting Nagios "
        ## Start daemon with startproc(8). If this fails
        ## the return value is set appropriately by startproc.
        start_daemon $FOO_BIN -d $nagios_config

        ## Remember status and be verbose
        rc_status -v

        ## Update PIDFILE if not automatically written
        pidofproc $FOO_BIN > $FOO_PIDFILE
        ;;
    stop)
        echo -n "Shutting down Nagios "
        ## Stop daemon with killproc(8) and if this fails
        ## killproc sets the return value according to LSB.
        ## Usage on RH: killproc [-p pidfile] [-d delay] {progrm} [-signal]

        ## This one signal TERM followed by e.g. signal KILL
        killproc $FOO_BIN
        ## or only with signal TERM
        # killproc $FOO_BIN -TERM

        ## Remember status and be verbose
        rc_status -v

        ## Remove PIDFILE it not automatically renmoved
        if test -e $FOO_PIDFILE ; then
             rm -f $FOO_PIDFILE
        fi
        ;;
    try-restart|condrestart)
        ## Do a restart only if the service was active before.
        ## Note: try-restart is now part of LSB (as of 1.9).
        ## RH has a similar command named condrestart.
        if test "$1" = "condrestart"; then
                echo "${attn} Use try-restart ${done}(LSB)${attn} rather than condrestart ${warn}(RH)${norm}"
        fi
        $0 status
        if test $? = 0; then
                $0 restart
        else
                rc_reset        # Not running is not a failure.
        fi
        ## Remember status and be quiet
        rc_status
        ;;
    restart)
        ## Stop the service and regardless of whether it was
        ## running or not, start it again.
        $0 stop
        $0 start

        ## Remember status and be quiet
        rc_status
        ;;
    force-reload)
        ## Signal the daemon to reload its config. Most daemons
        ## do this on signal 1 (SIGHUP).
        ## If it does not support it, restart the service if it
        ## is running.

        echo -n "Reload service FOO "
        ## if it supports it:
        killproc -HUP $FOO_BIN
        #touch /var/run/FOO.pid
        rc_status -v

        ## Otherwise:
        #$0 try-restart
        #rc_status
        ;;
    reload)
        ## Like force-reload, but if daemon does not support
        ## signaling, do nothing (!)

        ## If it supports signaling:
        echo -n "Reload service Nagios "
        killproc -HUP $FOO_BIN
        #touch /var/run/FOO.pid
        rc_status -v

        ## Otherwise if it does not support reload:
        #rc_failed 3
        #rc_status -v
        ;;
    status)
        echo -n "Checking for service Nagios "
        ## Check status with checkproc(8), if process is running
        ## checkproc will return with exit status 0.

        ## Return value is slightly different for the status command:
        ## 0 - service up and running
        ## 1 - service dead, but /var/run/  pid  file exists
        ## 2 - service dead, but /var/lock/ lock file exists
        ## 3 - service not running (unused)
        ## 4 - service status unknown :-(
        ## 5--199 reserved (5--99 LSB, 100--149 distro, 150--199 appl.)

        ## NOTE: checkproc returns LSB compliant status values.
        checkproc $FOO_BIN
        ## NOTE: rc_status knows that we called this init script with
        ## "status" option and adapts its messages accordingly.
        rc_status -v
        ;;
    probe)
        ## Optional: Probe for the necessity of a reload, print out the
        ## argument to this init script which is required for a reload.
        ## Note: probe is not (yet) part of LSB (as of 1.9)

        test /etc/FOO/FOO.conf -nt /var/run/FOO.pid && echo reload
        ;;
    *)
        echo "Usage: $0 {start|stop|status|try-restart|restart|force-reload|reload|probe}"
        exit 1
        ;;
esac
A restart later and all appears to be well except that NagiosQL seemed reluctant to restart Nagios when asked. This may be related to the lock file which the above script doesn't do (it does leave a PID file behind though). I used the .compat version as the notes seem to indicate that this version is more likely to be cross compatible with other systems. Goodness knows I'm angry enough with openSUSE for making it necessary for me to go this far in the first place! :x

Hopefully the above will be of use to somebody.
Last edited by mistie710 on Mon Dec 30, 2013 10:55 am, edited 1 time in total.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by sreinhardt »

Nice work so far, as to a script presently available for systemd, nope not presently. You would also likely be correct with the lock file issues when restarting via QL, i believe it uses that to find the proper pid to kill before allowing a restart.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mistie710
Posts: 4
Joined: Sat Dec 28, 2013 9:04 am

Re: Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by mistie710 »

Thanks for that.

It appears that there was a change made to Nagios 4 which has caused a problem with NagiosQL when it issues a restart to Nagios. There is a thread about this on the NagiosQL support forum which suggests the following:

Source: http://www.nagiosql.org/forum8/installa ... mitstart=0
I have checked that. Nagios 4.0.0 has a command queue - this ist located by default:

/usr/local/nagios/var/rw/nagios.cmd

But the old NagiosQL command does not work with Nagios 4.0.0. So edit verify.php and modify line 345 from:

$strCommandString = "[".mktime()."] RESTART_PROGRAM;".mktime();

to:

$strCommandString = "[".mktime()."] RESTART_PROGRAM\n";
The only other change I made was to the local configuration setting on QL to use the PID file rather than a lock file. And yes, it now all works! The above change, btw, refers to /srv/www/htdocs/nagiosql32/admin/verify.php if anyone asks (the /srv/www bit should be substituted with whatever your system uses for its apache/httpd root). :D
Last edited by mistie710 on Mon Dec 30, 2013 12:41 pm, edited 1 time in total.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by abrist »

Great. You may want to open a ticket at: http://tracker.nagios.org with your changes so that the init script can be updated to support suse.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
mistie710
Posts: 4
Joined: Sat Dec 28, 2013 9:04 am

Re: Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by mistie710 »

abrist wrote:Great. You may want to open a ticket at: http://tracker.nagios.org with your changes so that the init script can be updated to support suse.
Is done. It's tracker number 0000553 if anyone wants to see it, though it mostly points back here.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Cannot start Nagios 4.0.2 on openSUSE 13.1

Post by slansing »

Just a heads up, we're going to close this thread as you will be able to see replies on the tracker as changes are made. Thanks for reporting this!
Locked