service start nagios: he control process exited with error c

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
misja
Posts: 6
Joined: Tue Aug 22, 2017 9:37 am

service start nagios: he control process exited with error c

Post by misja »

Hello,
not much Nagios or Debian experience yet so quite helpless.

Nagios core won't start anymore after update from 4.3.2 to 4.3.4:
Warning: nagios.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Job for nagios.service failed because the control process exited with error code.
See "systemctl status nagios.service" and "journalctl -xe" for details.

Run 'systemctl daemon-reload' does not solve the issue I keep getting the error code message.

Nagios on Debian, I installed 4.3.2 from source following: http://www.miloszengel.com/nagios-core- ... -x-jessie/

I had it working for a while and came awaire of the update so I downloaded new source and compiled following: https://assets.nagios.com/downloads/nag ... ading.html

I did not do any modifications to the config files other then changing the location of the lock file from /usr/local/nagios/var/nagios.lock to /run/nagios.lock as suggested in the link above.

cmd: /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg gives an all good.

Might be an issue with rights? I installed 4.3.2 as root if I remember well not as user nagios, don't know if that was right but that version ran without issues untill the update.

Any suggestions?
dwasswa

Re: service start nagios: he control process exited with err

Post by dwasswa »

Can you please post your error logs for

Code: Select all

journalctl -xe
So i can track whats failing..
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: service start nagios: he control process exited with err

Post by tgriep »

Thanks @Derick Wasswa for the help but we would have to see the output of that command or you can also check the /var/log/messages file for any errors on why nagios is not starting.
Be sure to check out our Knowledgebase for helpful articles and solutions!
misja
Posts: 6
Joined: Tue Aug 22, 2017 9:37 am

Re: service start nagios: he control process exited with err

Post by misja »

Hello Derrick and tgriep,

thanks for the help.

When I look at output of journalctl -xe there is a lot of info but I think the bit that connects to nagios service is as follows (left some extra lines in the beginning as this is end of a cold start I assume nagios is the last to load):

Sep 01 09:43:47 debian-d510 systemd[580]: Startup finished in 593ms.
-- Subject: System start-up is now complete
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- All system services necessary queued for starting at boot have been
-- successfully started. Note that this does not mean that the machine is
-- now idle as services might still be busy with completing start-up.
--
-- Kernel start-up required KERNEL_USEC microseconds.
--
-- Initial RAM disk start-up required INITRD_USEC microseconds.
--
-- Userspace start-up required 593248 microseconds.
Sep 01 09:43:47 debian-d510 systemd[1]: Started User Manager for UID 113.
-- Subject: Unit user@113.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit user@113.service has finished starting up.
--
-- The start-up result is done.
Sep 01 09:43:47 debian-d510 su[561]: pam_unix(su:session): session closed for user nagios
Sep 01 09:43:47 debian-d510 nagios[552]: Starting nagios:ERROR: Could not create or update '/usr/local/nagios/var/nagios.configtest'
Sep 01 09:43:47 debian-d510 systemd[1]: nagios.service: Control process exited, code=exited status=8
Sep 01 09:43:47 debian-d510 systemd[1]: Failed to start LSB: Starts and stops the Nagios monitoring server.
-- Subject: Unit nagios.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit nagios.service has failed.
--
-- The result is failed.
Sep 01 09:43:47 debian-d510 systemd[1]: nagios.service: Unit entered failed state.


There seems to be no reference to nagios in /var/log/messages.

Is this enough info or should I include the full output of journalctl -xe?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service start nagios: he control process exited with err

Post by scottwilkerson »

This shouldn't be happening but lets run the following

Code: Select all

touch /usr/local/nagios/var/nagios.configtest
chown nagios:nagios /usr/local/nagios/var/nagios.configtest
chmod ug+rw /usr/local/nagios/var/nagios.configtest
and try starting again
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
bheden
Product Development Manager
Posts: 179
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: service start nagios: he control process exited with err

Post by bheden »

Did you update the location of the lock file in both the nagios.cfg file AND the init script?

You should be able to just update the lock_file= value in the cfg and then run make install-init from the path that you configured 4.3.4 and it should work.

If that doesn't fix it, send over your current nagios.cfg file and your init.d/nagios script please.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
misja
Posts: 6
Joined: Tue Aug 22, 2017 9:37 am

Re: service start nagios: he control process exited with err

Post by misja »

i ran the commands for nagios.configtest (before i did the file did not exist).

then i tried "service nagios start" and got the same error, the file nagios.configtest is gone again.

the init script you talk about is that init.d/nagios? i only editted the nagios.cfg file for the location of the lockfile and might have done that after running install-init so i ran install-init again but no succes. when i look in init.d/nagios it does not have the right location and name for the lock file i think so maybe install-init went wrong?

i included the files you asked for.
Attachments
nagios.cfg
(43.8 KiB) Downloaded 1229 times
misja
Posts: 6
Joined: Tue Aug 22, 2017 9:37 am

Re: service start nagios: he control process exited with err

Post by misja »

i try again
somehow one file missing (or is it not possible to upload two files?) here is the next one inline.

Code: Select all

#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
# processname: nagios
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
#  - setup for autoconf
#  - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
#  - Added configuration info for use with RedHat's chkconfig tool
#    per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
#  - added variable for nagios/var directory
#  - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
#  - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
#  - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
#  - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
#              used to provide network services status.
#
### BEGIN INIT INFO
# Provides:		nagios
# Required-Start:	$local_fs $syslog $network
# Required-Stop:	$local_fs $syslog $network
# Default-Start:	2 3 4 5
# Default-Stop:		0 1 6
# Short-Description:	Starts and stops the Nagios monitoring server
# Description:		Starts and stops the Nagios monitoring server
### END INIT INFO

# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=/run/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"

# Source function library
# Some *nix do not have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
	. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
	. /etc/init.d/functions
elif [ -f /lib/lsb/init-functions ]; then
	. /lib/lsb/init-functions
fi

# Load any extra environment variables for Nagios and its plugins.
if test -f /etc/sysconfig/nagios; then
	. /etc/sysconfig/nagios
fi

# Automate addition of RAMDISK based on environment variables
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
	ramdisk=`mount |grep "${RAMDISK_DIR} type tmpfs"`
	if [ "$ramdisk"X == "X" ]; then
		mkdir -p -m 0755 ${RAMDISK_DIR}
		mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs ${RAMDISK_DIR}
		mkdir -p -m 0755 ${RAMDISK_DIR}/checkresults
		chown -h -R $NagiosUser:$NagiosGroup ${RAMDISK_DIR}
	fi
fi


check_config ()
{
	rm -f "$NagiosCfgtestFile";
	if test -e "$NagiosCfgtestFile"; then
		echo "ERROR: Could not delete '$NagiosCfgtestFile'"
		exit 8
	fi
	if ! su $NagiosUser -c "touch $NagiosCfgtestFile"; then
		echo "ERROR: Could not create or update '$NagiosCfgtestFile'"
		exit 8
	fi

	TMPFILE=$(mktemp /tmp/.configtest.XXXXXXXX)
	$NagiosBin -vp $NagiosCfgFile > "$TMPFILE"
	WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
	ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`

	if test "$WARN" = "0" && test "${ERR}" = "0"; then
		echo "OK - Configuration check verified" > $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	elif test "${ERR}" = "0"; then
		# Write the errors to a file we can have a script watching for.
		echo "WARNING: Warnings in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	else
		# Write the errors to a file we can have a script watching for.
		echo "ERROR: Errors in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		cat "$TMPFILE"
		exit 8
	fi
}


status_nagios ()
{
	if test -x $NagiosCGI/daemonchk.cgi; then
		if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile > /dev/null 2>&1; then return 0; fi
	else
		if ps -p $NagiosPID > /dev/null 2>&1; then return 0; fi
	fi

	return 1
}

printstatus_nagios ()
{
	if status_nagios; then
		echo "nagios (pid $NagiosPID) is running..."
	else
		echo "nagios is not running"
	fi
}

killproc_nagios ()
{
	kill -s "$1" $NagiosPID
}

pid_nagios ()
{
	if test ! -f $NagiosRunFile; then
		echo "No lock file found in $NagiosRunFile"
		exit 1
	fi

	NagiosPID=`head -n 1 $NagiosRunFile`
}

remove_commandfile ()
{
	# Removing a stalled command file, while there are processes trying/waiting to write into it,
	# will deadlock those processes in a blocking open() system call. To allow such processes to
	# die on a broken pipe, the pipe must be opened for reading without actually reading from it,
	# which is what dd does here. To avoid a chicken-egg problem, the pipe is renamed beforehand.
	# In order for the dd to not deadlock when there is no writing process, it is executed in the
	# background in a subshell together with an empty echo to have at least one writing process.
	
	# see http://unix.stackexchange.com/questions/335406/opening-named-pipe-blocks-forever-if-pipe-is-deleted-without-being-connected
	
	if [ -p $NagiosCommandFile ]; then
		mv -f $NagiosCommandFile $NagiosCommandFile~
		(dd if=$NagiosCommandFile~ count=0 2>/dev/null & echo -n "" >$NagiosCommandFile~)
	fi
	
	rm -f $NagiosCommandFile $NagiosCommandFile~
}


# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
    echo "Executable file $NagiosBin not found. Exiting."
    exit 1
fi

# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
    echo "Configuration file $NagiosCfgFile not found. Exiting."
    exit 1
fi

# See how we were called.
case "$1" in

	start)
		echo -n "Starting nagios:"

		if test "$checkconfig" = "true"; then
			check_config
			# check_config exits on configuration errors.
		fi

		if test -f $NagiosRunFile; then
			NagiosPID=`head -n 1 $NagiosRunFile`
			if status_nagios; then
				echo " another instance of nagios is already running."
				exit 0
			fi
		fi

		su $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
		remove_commandfile
		touch $NagiosRunFile
		$NagiosBin -d $NagiosCfgFile
		if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

		echo " done."
		;;

	stop)
		echo -n "Stopping nagios:"

		pid_nagios
		killproc_nagios TERM

		# now we have to wait for nagios to exit and remove its
		# own NagiosRunFile, otherwise a following "start" could
		# happen, and then the exiting nagios will remove the
		# new NagiosRunFile, allowing multiple nagios daemons
		# to (sooner or later) run - John Sellens
		#echo -n 'Waiting for nagios to exit .'
		for i in {1..90}; do
			if status_nagios > /dev/null; then
				echo -n '.'
				sleep 1
			else
				break
			fi
		done
		if status_nagios > /dev/null; then
			echo ''
			echo 'Warning - nagios did not exit in a timely manner - Killing it!'
			killproc_nagios KILL
		else
			echo ' done.'
		fi

		remove_commandfile
		rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile
		;;

	status)
		pid_nagios
		printstatus_nagios
		;;

	checkconfig)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if [ $? -eq 0 ]; then
			echo " OK."
		else
			echo " CONFIG ERROR!  Check your Nagios configuration."
			exit 1
		fi
		;;

	restart)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		$0 stop
		$0 start
		;;

	reload|force-reload)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if test ! -f $NagiosRunFile; then
			$0 start
		else
			pid_nagios
			if status_nagios > /dev/null; then
				printf "Reloading nagios configuration...\n"
				killproc_nagios HUP
				echo "done"
			else
				$0 stop
				$0 start
			fi
		fi
		;;

	configtest)
		$NagiosBin -vp $NagiosCfgFile
		;;

	*)
		echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig|configtest}"
		exit 1
		;;

esac

# End of this script
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: service start nagios: he control process exited with err

Post by tgriep »

It may be a permission problem for the files or folders, can you run the following as root and post the output so we can check the permissions?

Code: Select all

ls -l /usr/local/nagios/
ls -l /usr/local/nagios/var/
ls -l /
ls -al /tmp
Be sure to check out our Knowledgebase for helpful articles and solutions!
misja
Posts: 6
Joined: Tue Aug 22, 2017 9:37 am

Re: service start nagios: he control process exited with err

Post by misja »

Hi thanks for your help so far. hereby the results:

Code: Select all

root@debian-d510:~# ls -l /usr/local/nagios/
total 28
drwxrwsr-x  2 nagios nagios 4096 Aug 29 22:44 bin
drwxrwsr-x  3 nagios nagios 4096 Aug 29 23:31 etc
drwxr-sr-x  2 root   staff  4096 Aug 13 23:16 include
drwxrwsr-x  3 nagios nagios 4096 Aug 13 23:16 libexec
drwxrwsr-x  2 nagios nagios 4096 Aug 29 22:44 sbin
drwxrwsr-x 15 nagios nagios 4096 Aug 29 22:44 share
drwxrwsr-x  5 nagios nagios 4096 Sep  1 23:31 var

Code: Select all

root@debian-d510:~# ls -l /usr/local/nagios/var/
total 104
drwxrwsr-x 2 nagios nagios    4096 Aug 25 00:00 archives
-rw-r--r-- 1 nagios nagios   13415 Aug 29 22:36 nagios.log
-rw-r--r-- 1 nagios nagios   20674 Aug 29 22:28 objects.cache
-rw-r--r-- 1 nagios nagios   20674 Aug 29 22:28 objects.precache
-rw------- 1 nagios nagios   26071 Aug 29 22:36 retention.dat
drwxrwsr-x 2 nagios www-data  4096 Aug 29 22:36 rw
drwxr-sr-x 3 root   nagios    4096 Aug 13 23:08 spool

Code: Select all

root@debian-d510:~# ls -l /
total 72
drwxr-xr-x   2 root root  4096 Aug 13 22:55 bin
drwxr-xr-x   3 root root  4096 Aug 13 22:49 boot
drwxr-xr-x  17 root root  3120 Sep  6 22:33 dev
drwxr-xr-x 106 root root  4096 Aug 25 17:20 etc
drwxr-xr-x   3 root root  4096 Aug 10 21:46 home
lrwxrwxrwx   1 root root    31 Aug 10 21:28 initrd.img -> boot/initrd.img-4.9.0-3-686-pae
lrwxrwxrwx   1 root root    31 Aug 10 21:28 initrd.img.old -> boot/initrd.img-4.9.0-3-686-pae
drwxr-xr-x  16 root root  4096 Aug 13 22:49 lib
drwx------   2 root root 16384 Aug 10 21:23 lost+found
drwxr-xr-x   4 root root  4096 Aug 11 17:14 media
drwxr-xr-x   3 root root  4096 Sep  1 11:11 mnt
drwxr-xr-x   2 root root  4096 Aug 10 21:24 opt
dr-xr-xr-x  86 root root     0 Sep  6 22:33 proc
drwx------   4 root root  4096 Sep  2 00:42 root
drwxr-xr-x  20 root root   640 Sep  6 22:42 run
drwxr-xr-x   2 root root  4096 Aug 13 22:49 sbin
drwxr-xr-x   2 root root  4096 Aug 10 21:24 srv
dr-xr-xr-x  13 root root     0 Sep  6 22:45 sys
drwxrwxrwt   8 root root  4096 Sep  6 22:43 tmp
drwxr-xr-x  10 root root  4096 Aug 10 21:24 usr
drwxr-xr-x  12 root root  4096 Aug 10 21:33 var
lrwxrwxrwx   1 root root    28 Aug 10 21:28 vmlinuz -> boot/vmlinuz-4.9.0-3-686-pae
lrwxrwxrwx   1 root root    28 Aug 10 21:28 vmlinuz.old -> boot/vmlinuz-4.9.0-3-686-pae

Code: Select all

root@debian-d510:~# ls -al /tmp
total 32
drwxrwxrwt  8 root root 4096 Sep  6 22:43 .
drwxr-xr-x 21 root root 4096 Aug 28 17:57 ..
drwxrwxrwt  2 root root 4096 Sep  6 22:33 .font-unix
drwxrwxrwt  2 root root 4096 Sep  6 22:33 .ICE-unix
drwx------  3 root root 4096 Sep  6 22:34 systemd-private-8fd4a4c2e68841799c8d6d28cd263583-apache2.service-sYvbkS
drwxrwxrwt  2 root root 4096 Sep  6 22:33 .Test-unix
drwxrwxrwt  2 root root 4096 Sep  6 22:33 .X11-unix
drwxrwxrwt  2 root root 4096 Sep  6 22:33 .XIM-unix
Locked