
I am testing a new build which will use Nagios 4.0.2 and NagiosQL 3.2 on openSUSE 13.1. So far everything is working as it should except for one specific thing - starting Nagios. As you may be aware if you have used openSUSE recently, they switched to systemd from version 12.1 onwards and the sysvinit startup used by Nagios no longer seems to work. As I mentioned before, I did find a thread on here that suggested an alternative that does appear to work on 13.1 as long as you make a couple of adjustments.
/etc/init.d/nagios
- Code: Select all
# Nagios Startup script for the Nagios monitoring daemon
#
# chkconfig: - 85 15
# description: Nagios is a service monitoring system
# processname: nagios
# config: /etc/nagios/nagios.cfg
# pidfile: /var/nagios/nagios.pid
#
### BEGIN INIT INFO
# Provides: nagios
# Required-Start: $local_fs $syslog $network
# Required-Stop: $local_fs $syslog $network
# Short-Description: start and stop Nagios monitoring server
# Description: Nagios is is a service monitoring system
# This is a patched version of the startup as the
# core has been ******* by the developers.
### END INIT INFO
# Source function library.
# . /etc/rc.d/init.d/functions
. /lib/lsb/init-functions
prefix="/usr/local/nagios"
exec_prefix="${prefix}"
exec="${exec_prefix}/bin/nagios"
prog="nagios"
config="${prefix}/etc/nagios.cfg"
pidfile="${prefix}/var/nagios.lock"
user="nagios"
group="nagios"
checkconfig="false"
ramdiskdir="/var/nagios/ramcache"
test -e /etc/sysconfig/$prog && . /etc/sysconfig/$prog
lockfile=/var/lock/$prog
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
ramdisk=`mount |grep "$ramdiskdir type tmpfs"`
if [ "$ramdisk"X == "X" ]; then
mkdir -p -m 0755 $ramdiskdir
mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs $ramdiskdir
mkdir -p -m 0755 $ramdiskdir/checkresults
chown -R $user:$group $ramdiskdir
fi
fi
check_config() {
TMPFILE="/tmp/.configtest.$$"
/usr/sbin/service nagios configtest > "$TMPFILE"
WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
if test "$WARN" = "0" && test "${ERR}" = "0"; then
echo "OK - Configuration check verified" > /var/run/nagios.configtest
chmod 0644 /var/run/nagios.configtest
/bin/rm "$TMPFILE"
return 0
else
# We'll write out the errors to a file we can have a
# script watching for
echo "WARNING: Errors in config files - see log for details: $TMPFILE" > /var/run/nagios.configtest
egrep -i "(^warning|^error)" "$TMPFILE" >> /var/run/nagios.configtest
chmod 0644 /var/run/nagios.configtest
cat "$TMPFILE"
exit 8
fi
}
start() {
echo "Start option selected"
echo "prog var = "$prog
test -x $exec || exit 5
test -f $config || exit 6
if test "$checkconfig" = "false"; then
check_config
fi
echo -n $"Starting $prog: "
# We need to _make sure_ the precache is there and verified
# Raise priority to make it run better
startproc -u $user -- $exec -d $config
#touch $lockfile
retval=$?
echo
test $retval -eq 0 && touch $lockfile
return $retval
}
stop() {
echo -n $"Stopping $prog: "
killproc -p ${pidfile} $exec
retval=$?
echo
test $retval -eq 0 && rm -f $lockfile
return $retval
}
restart() {
check_config
checkconfig="true"
stop
start
}
reload() {
echo -n $"Reloading $prog: "
killproc -p ${pidfile} $exec -HUP
RETVAL=$?
echo
}
force_reload() {
restart
}
case "$1" in
start)
checkproc $prog && exit 0
$1
;;
stop)
checkproc $prog|| exit 0
$1
;;
restart)
$1
;;
reload)
checkproc $prog || exit 7
$1
;;
force-reload)
force_reload
;;
status)
checkproc $prog
;;
condrestart|try-restart)
checkproc $prog|| exit 0
restart
;;
configtest)
$nice su -s /bin/bash - nagios -c "$corelimit >/dev/null 2>&1 ; $exec -vp $config"
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload|configtest}"
exit 2
esac
exit $?
The only differences were that I replaced the "daemon" call at line 82 with the equivalent "startproc" call and changed the "status_of_proc" call in each case from line 120 down with "checkproc". The problem now appears to be that systemd is getting in the way of everything in that the script is no longer being called, systemctl taking control instead. When that happens, it becomes impossible to start or stop (or do anything else) with Nagios as systemctl insists that the service is down, irrespective of what the actual state of the daemon is.
Is there another script that needs to be found to amend this? Or is there a way to stop systemctl bullying the init.d script out of the way? Or am I missing something here? That is besides the post I originally tried to post to, of course!

Ah, found the edit option!!!
Anyway, I've been working away on this and seem to have found a solution of sorts. In an earlier version of openSUSE, there were a couple of templates in the /etc/init.d directory name "skeleton" and "skeleton.compat". I took the latter of these across to openSUSE 13.1 and edited it to give the code below:
- Code: Select all
#!/bin/bash
#
# LSB system startup script for Nagios 4.0.2
# Based on LSB template Copyright (C) 1995--2005 Kurt Garloff,
# SUSE / Novell Inc., set up by Chris Johnson (Chika)
#
# This library is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or (at
# your option) any later version.
#
# This library is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
#
# /etc/init.d/nagios
# LSB compatible service control script; see http://www.linuxbase.org/spec/
# Please send feedback to http://www.suse.de/feedback/
#
# Note: This template uses functions rc_XXX defined in /etc/rc.status on
# UnitedLinux/SUSE/Novell based Linux distributions. However, it will work
# on other distributions as well, by using the LSB (Linux Standard Base)
# or RH functions or by open coding the needed functions.
# Read http://www.tldp.org/HOWTO/HighQuality-Apps-HOWTO/ if you prefer not
# to use this template.
#
# chkconfig: 345 99 00
# description: Nagios Nagios daemon providing system monitor
#
### BEGIN INIT INFO
# Provides: Nagios
# Required-Start: $syslog $remote_fs $time
# Should-Start: $time ypbind smtp
# Required-Stop: $syslog $remote_fs
# Should-Stop: ypbind smtp
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Short-Description: Nagios Nagios daemon providing system and net monitoring
# Description: Start Nagios to allow scanning of network and the systems
# that depend on it, providing a cgi website that is available
# to users and provide alerts via email and SMS of problems
### END INIT INFO
#
# Any extensions to the keywords given above should be preceeded by
# X-VendorTag- according to LSB.
#
# Some of the comments have been removed for space. Should you wish to see
# them, please refer to the original "/etc/init.d/skeleton" scripts.
# Check for missing binaries (stale symlinks should not happen)
# Note: Special treatment of stop for LSB conformance
FOO_BIN=/usr/local/nagios/bin/nagios
test -x $FOO_BIN || { echo "$FOO_BIN not installed";
if [ "$1" = "stop" ]; then exit 0;
else exit 5; fi; }
# Check for existence of needed config file and read it
FOO_CONFIG=/etc/sysconfig/nagios
test -r $FOO_CONFIG || { echo "$FOO_CONFIG not existing";
if [ "$1" = "stop" ]; then exit 0;
else exit 6; fi; }
# Read config
. $FOO_CONFIG
# Some variables
nagios_config=/usr/local/nagios/etc/nagios.cfg
FOO_PIDFILE=/var/nagios/nagios.pid
# Source LSB init functions
# providing start_daemon, killproc, pidofproc,
# log_success_msg, log_failure_msg and log_warning_msg.
# This is currently not used by UnitedLinux based distributions and
# not needed for init scripts for UnitedLinux only. If it is used,
# the functions from rc.status should not be sourced or used.
#. /lib/lsb/init-functions
# Shell functions sourced from /etc/rc.status:
# rc_check check and set local and overall rc status
# rc_status check and set local and overall rc status
# rc_status -v be verbose in local rc status and clear it afterwards
# rc_status -v -r ditto and clear both the local and overall rc status
# rc_status -s display "skipped" and exit with status 3
# rc_status -u display "unused" and exit with status 3
# rc_failed set local and overall rc status to failed
# rc_failed <num> set local and overall rc status to <num>
# rc_reset clear both the local and overall rc status
# rc_exit exit appropriate to overall rc status
# rc_active checks whether a service is activated by symlinks
# Use the SUSE rc_ init script functions;
# emulate them on LSB, RH and other systems
# Default: Assume sysvinit binaries exist
start_daemon() { /sbin/start_daemon ${1+"$@"}; }
killproc() { /sbin/killproc ${1+"$@"}; }
pidofproc() { /sbin/pidofproc ${1+"$@"}; }
checkproc() { /sbin/checkproc ${1+"$@"}; }
if test -e /etc/rc.status; then
# SUSE rc script library
. /etc/rc.status
else
export LC_ALL=POSIX
_cmd=$1
declare -a _SMSG
if test "${_cmd}" = "status"; then
_SMSG=(running dead dead unused unknown reserved)
_RC_UNUSED=3
else
_SMSG=(done failed failed missed failed skipped unused failed failed reserved)
_RC_UNUSED=6
fi
if test -e /lib/lsb/init-functions; then
# LSB
. /lib/lsb/init-functions
echo_rc()
{
if test ${_RC_RV} = 0; then
log_success_msg " [${_SMSG[${_RC_RV}]}] "
else
log_failure_msg " [${_SMSG[${_RC_RV}]}] "
fi
}
# TODO: Add checking for lockfiles
checkproc() { pidofproc ${1+"$@"} >/dev/null 2>&1; return $?; }
elif test -e /etc/init.d/functions; then
# RHAT
. /etc/init.d/functions
echo_rc()
{
#echo -n " [${_SMSG[${_RC_RV}]}] "
if test ${_RC_RV} = 0; then
success " [${_SMSG[${_RC_RV}]}] "
else
failure " [${_SMSG[${_RC_RV}]}] "
fi
}
checkproc() { status ${1+"$@"}; return $?; }
start_daemon() { daemon ${1+"$@"}; return $?; }
else
# emulate it
echo_rc() { echo " [${_SMSG[${_RC_RV}]}] "; }
fi
rc_reset() { _RC_RV=0; }
rc_failed()
{
if test -z "$1"; then
_RC_RV=1;
elif test "$1" != "0"; then
_RC_RV=$1;
fi
return ${_RC_RV}
}
rc_check()
{
return rc_failed $?
}
rc_status()
{
rc_failed $?
if test "$1" = "-r"; then _RC_RV=0; shift; fi
if test "$1" = "-s"; then rc_failed 5; echo_rc; rc_failed 3; shift; fi
if test "$1" = "-u"; then rc_failed ${_RC_UNUSED}; echo_rc; rc_failed 3; shift; fi
if test "$1" = "-v"; then echo_rc; shift; fi
if test "$1" = "-r"; then _RC_RV=0; shift; fi
return ${_RC_RV}
}
rc_exit() { exit ${_RC_RV}; }
rc_active()
{
if test -z "$RUNLEVEL"; then read RUNLEVEL REST < <(/sbin/runlevel); fi
if test -e /etc/init.d/S[0-9][0-9]${1}; then return 0; fi
return 1
}
fi
# Reset status of this service
rc_reset
# Return values acc. to LSB for all commands but status:
# 0 - success
# 1 - generic or unspecified error
# 2 - invalid or excess argument(s)
# 3 - unimplemented feature (e.g. "reload")
# 4 - user had insufficient privileges
# 5 - program is not installed
# 6 - program is not configured
# 7 - program is not running
# 8--199 - reserved (8--99 LSB, 100--149 distrib, 150--199 appl)
#
# Note that starting an already running service, stopping
# or restarting a not-running service as well as the restart
# with force-reload (in case signaling is not supported) are
# considered a success.
case "$1" in
start)
echo -n "Starting Nagios "
## Start daemon with startproc(8). If this fails
## the return value is set appropriately by startproc.
start_daemon $FOO_BIN -d $nagios_config
## Remember status and be verbose
rc_status -v
## Update PIDFILE if not automatically written
pidofproc $FOO_BIN > $FOO_PIDFILE
;;
stop)
echo -n "Shutting down Nagios "
## Stop daemon with killproc(8) and if this fails
## killproc sets the return value according to LSB.
## Usage on RH: killproc [-p pidfile] [-d delay] {progrm} [-signal]
## This one signal TERM followed by e.g. signal KILL
killproc $FOO_BIN
## or only with signal TERM
# killproc $FOO_BIN -TERM
## Remember status and be verbose
rc_status -v
## Remove PIDFILE it not automatically renmoved
if test -e $FOO_PIDFILE ; then
rm -f $FOO_PIDFILE
fi
;;
try-restart|condrestart)
## Do a restart only if the service was active before.
## Note: try-restart is now part of LSB (as of 1.9).
## RH has a similar command named condrestart.
if test "$1" = "condrestart"; then
echo "${attn} Use try-restart ${done}(LSB)${attn} rather than condrestart ${warn}(RH)${norm}"
fi
$0 status
if test $? = 0; then
$0 restart
else
rc_reset # Not running is not a failure.
fi
## Remember status and be quiet
rc_status
;;
restart)
## Stop the service and regardless of whether it was
## running or not, start it again.
$0 stop
$0 start
## Remember status and be quiet
rc_status
;;
force-reload)
## Signal the daemon to reload its config. Most daemons
## do this on signal 1 (SIGHUP).
## If it does not support it, restart the service if it
## is running.
echo -n "Reload service FOO "
## if it supports it:
killproc -HUP $FOO_BIN
#touch /var/run/FOO.pid
rc_status -v
## Otherwise:
#$0 try-restart
#rc_status
;;
reload)
## Like force-reload, but if daemon does not support
## signaling, do nothing (!)
## If it supports signaling:
echo -n "Reload service Nagios "
killproc -HUP $FOO_BIN
#touch /var/run/FOO.pid
rc_status -v
## Otherwise if it does not support reload:
#rc_failed 3
#rc_status -v
;;
status)
echo -n "Checking for service Nagios "
## Check status with checkproc(8), if process is running
## checkproc will return with exit status 0.
## Return value is slightly different for the status command:
## 0 - service up and running
## 1 - service dead, but /var/run/ pid file exists
## 2 - service dead, but /var/lock/ lock file exists
## 3 - service not running (unused)
## 4 - service status unknown :-(
## 5--199 reserved (5--99 LSB, 100--149 distro, 150--199 appl.)
## NOTE: checkproc returns LSB compliant status values.
checkproc $FOO_BIN
## NOTE: rc_status knows that we called this init script with
## "status" option and adapts its messages accordingly.
rc_status -v
;;
probe)
## Optional: Probe for the necessity of a reload, print out the
## argument to this init script which is required for a reload.
## Note: probe is not (yet) part of LSB (as of 1.9)
test /etc/FOO/FOO.conf -nt /var/run/FOO.pid && echo reload
;;
*)
echo "Usage: $0 {start|stop|status|try-restart|restart|force-reload|reload|probe}"
exit 1
;;
esac
A restart later and all appears to be well except that NagiosQL seemed reluctant to restart Nagios when asked. This may be related to the lock file which the above script doesn't do (it does leave a PID file behind though). I used the .compat version as the notes seem to indicate that this version is more likely to be cross compatible with other systems. Goodness knows I'm angry enough with openSUSE for making it necessary for me to go this far in the first place!

Hopefully the above will be of use to somebody.