Page 1 of 2

nagios not working properly

Posted: Sun Nov 26, 2017 3:14 am
by prasa880
Hi ,

i upgraded nagios 4.3.2 to 4.3.4 after that nagios not working properly. when i check the status of nagios "nagios is not running" but it is monitoring.
in logs throwing " prd-nagios nagios: job 1143 (pid=18860): read() returned error 11" errors. not able stop the nagios while stopping "Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] done. process not being killed.

thanks in advance

regards
Prasanth

Re: nagios not working properly

Posted: Mon Nov 27, 2017 4:36 pm
by dwhitfield
What's the output of ps -aef | grep nagios.cfg? It sounds to me like you have a couple of different nagios processes running.

That error 11 is usually not actually an issue. Are there any other errors?

Re: nagios not working properly

Posted: Tue Nov 28, 2017 10:59 pm
by prasa880
output:
[root@bbnlnagios1 ~]# ps -aef | grep nagios.cfg
root 22945 22893 0 09:29 pts/0 00:00:00 grep nagios.cfg
nagios 24907 1 0 Nov26 ? 00:24:49 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24913 24907 0 Nov26 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Re: nagios not working properly

Posted: Wed Nov 29, 2017 3:14 pm
by kyang
Is SELinux still disabled?

Code: Select all

sestatus
Could you also run this and show us the output?

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Is it the same error when you try restarting nagios?

Code: Select all

service nagios restart

Re: nagios not working properly

Posted: Sun Dec 03, 2017 7:40 am
by prasa880
1.SELinux status: disabled
#####################################################
2.
[root@bbnlnagios1 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 1632 services.
Checked 610 hosts.
Checked 37 host groups.
Checked 5 service groups.
Checked 217 contacts.
Checked 22 contact groups.
Checked 63 commands.
Checked 5 time periods.
Checked 498 host escalations.
Checked 2 service escalations.
Checking for circular paths...
Checked 610 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Re: nagios not working properly

Posted: Sun Dec 03, 2017 7:46 am
by prasa880
still i am seeing......this error while restarting nagios service

[root@bbnlnagios1 ~]# service nagios restart
Running configuration check...
Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
done.
Starting nagios: done.

AND

Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'!

The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.

An error occurred while attempting to commit your command for processing.

Return from whence you came

------------------------
whenever restarting nagios using frontend.............i am getting above error

Thanks in advance
Regards
prasanth

Re: nagios not working properly

Posted: Mon Dec 04, 2017 11:54 am
by kyang
Could you run these commands and post the output?

Code: Select all

service nagios status

Code: Select all

ls -l /usr/local/nagios/var/rw

Code: Select all

cat /etc/group|grep nag

Re: nagios not working properly

Posted: Thu Dec 07, 2017 12:38 am
by prasa880
hi, find out the output
service nagios status

nagios is not running


ls -l /usr/local/nagios/var/rw
total 4
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh


~]# cat /etc/group|grep nag
nagios:x:500:
nagcmd:x:501:nagios,apache


Regards
Prasanth

Re: nagios not working properly

Posted: Thu Dec 07, 2017 1:39 pm
by lmiltchev
Can you run the following commands and show the output?

Code: Select all

uname -a
cat /etc/*release
What do you have in the "cmd" directory? Normally, you should have only the nagios.cmd and nagios.qh files in the "/usr/local/nagios/var/rw" directory...
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh

Code: Select all

ls -la /usr/local/nagios/var/rw/cmd
Add the nagios user to the nagios group:

Code: Select all

useradd -g nagios nagios
Please post the /etc/init.d/nagios file for a review.

Re: nagios not working properly

Posted: Tue Dec 12, 2017 12:32 am
by prasa880
Hi, find the output
1.uname -a

Linux hostname.test.in 2.6.32-696.6.3.el6.x86_64 #1 SMP Wed Jul 12 14:17:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

2.cat /etc/*release

CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)

3.cmd directory, not relealted with nagios , testing purpose i have created.

4. useradd -g nagios nagios
useradd: user 'nagios' already exists

5.
[root@bbnlnagios1 cmd]# cat /etc/init.d/nagios

Code: Select all

#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
# processname: nagios
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
#  - setup for autoconf
#  - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
#  - Added configuration info for use with RedHat's chkconfig tool
#    per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
#  - added variable for nagios/var directory
#  - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
#  - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
#  - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
#  - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
#              used to provide network services status.
#
### BEGIN INIT INFO
# Provides:		nagios
# Required-Start:	$local_fs $syslog $network
# Required-Stop:	$local_fs $syslog $network
# Default-Start:	2 3 4 5
# Default-Stop:		0 1 6
# Short-Description:	Starts and stops the Nagios monitoring server
# Description:		Starts and stops the Nagios monitoring server
### END INIT INFO

# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=/var/run/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"

# Source function library
# Some *nix do not have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
	. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
	. /etc/init.d/functions
elif [ -f /lib/lsb/init-functions ]; then
	. /lib/lsb/init-functions
fi

# Load any extra environment variables for Nagios and its plugins.
if test -f /etc/sysconfig/nagios; then
	. /etc/sysconfig/nagios
fi

# Automate addition of RAMDISK based on environment variables
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
	ramdisk=`mount |grep "${RAMDISK_DIR} type tmpfs"`
	if [ "$ramdisk"X == "X" ]; then
		mkdir -p -m 0755 ${RAMDISK_DIR}
		mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs ${RAMDISK_DIR}
		mkdir -p -m 0755 ${RAMDISK_DIR}/checkresults
		chown -h -R $NagiosUser:$NagiosGroup ${RAMDISK_DIR}
	fi
fi


check_config ()
{
	rm -f "$NagiosCfgtestFile";
	if test -e "$NagiosCfgtestFile"; then
		echo "ERROR: Could not delete '$NagiosCfgtestFile'"
		exit 8
	fi
	if ! su $NagiosUser -c "touch $NagiosCfgtestFile"; then
		echo "ERROR: Could not create or update '$NagiosCfgtestFile'"
		exit 8
	fi

	TMPFILE=$(mktemp /tmp/.configtest.XXXXXXXX)
	$NagiosBin -vp $NagiosCfgFile > "$TMPFILE"
	WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
	ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`

	if test "$WARN" = "0" && test "${ERR}" = "0"; then
		echo "OK - Configuration check verified" > $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	elif test "${ERR}" = "0"; then
		# Write the errors to a file we can have a script watching for.
		echo "WARNING: Warnings in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	else
		# Write the errors to a file we can have a script watching for.
		echo "ERROR: Errors in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		cat "$TMPFILE"
		exit 8
	fi
}


status_nagios ()
{
	if test -x $NagiosCGI/daemonchk.cgi; then
		if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile > /dev/null 2>&1; then return 0; fi
	else
		if ps -p $NagiosPID > /dev/null 2>&1; then return 0; fi
	fi

	return 1
}

printstatus_nagios ()
{
	if status_nagios; then
		echo "nagios (pid $NagiosPID) is running..."
	else
		echo "nagios is not running"
	fi
}

killproc_nagios ()
{
	kill -s "$1" $NagiosPID
}

pid_nagios ()
{
	if test ! -f $NagiosRunFile; then
		echo "No lock file found in $NagiosRunFile"
		exit 1
	fi

	NagiosPID=`head -n 1 $NagiosRunFile`
}

remove_commandfile ()
{
	# Removing a stalled command file, while there are processes trying/waiting to write into it,
	# will deadlock those processes in a blocking open() system call. To allow such processes to
	# die on a broken pipe, the pipe must be opened for reading without actually reading from it,
	# which is what dd does here. To avoid a chicken-egg problem, the pipe is renamed beforehand.
	# In order for the dd to not deadlock when there is no writing process, it is executed in the
	# background in a subshell together with an empty echo to have at least one writing process.
	
	# see http://unix.stackexchange.com/questions/335406/opening-named-pipe-blocks-forever-if-pipe-is-deleted-without-being-connected
	
	if [ -p $NagiosCommandFile ]; then
		mv -f $NagiosCommandFile $NagiosCommandFile~
		(dd if=$NagiosCommandFile~ count=0 2>/dev/null & echo -n "" >$NagiosCommandFile~)
	fi
	
	rm -f $NagiosCommandFile $NagiosCommandFile~
}


# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
    echo "Executable file $NagiosBin not found. Exiting."
    exit 1
fi

# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
    echo "Configuration file $NagiosCfgFile not found. Exiting."
    exit 1
fi

# See how we were called.
case "$1" in

	start)
		echo -n "Starting nagios:"

		if test "$checkconfig" = "true"; then
			check_config
			# check_config exits on configuration errors.
		fi

		if test -f $NagiosRunFile; then
			NagiosPID=`head -n 1 $NagiosRunFile`
			if status_nagios; then
				echo " another instance of nagios is already running."
				exit 0
			fi
		fi

		su $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
		remove_commandfile
		touch $NagiosRunFile
		$NagiosBin -d $NagiosCfgFile
		if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

		echo " done."
		;;

	stop)
		echo -n "Stopping nagios:"

		pid_nagios
		killproc_nagios TERM

		# now we have to wait for nagios to exit and remove its
		# own NagiosRunFile, otherwise a following "start" could
		# happen, and then the exiting nagios will remove the
		# new NagiosRunFile, allowing multiple nagios daemons
		# to (sooner or later) run - John Sellens
		#echo -n 'Waiting for nagios to exit .'
		for i in {1..90}; do
			if status_nagios > /dev/null; then
				echo -n '.'
				sleep 1
			else
				break
			fi
		done
		if status_nagios > /dev/null; then
			echo ''
			echo 'Warning - nagios did not exit in a timely manner - Killing it!'
			killproc_nagios KILL
		else
			echo ' done.'
		fi

		remove_commandfile
		rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile
		;;

	status)
		pid_nagios
		printstatus_nagios
		;;

	checkconfig)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if [ $? -eq 0 ]; then
			echo " OK."
		else
			echo " CONFIG ERROR!  Check your Nagios configuration."
			exit 1
		fi
		;;

	restart)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		$0 stop
		$0 start
		;;

	reload|force-reload)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if test ! -f $NagiosRunFile; then
			$0 start
		else
			pid_nagios
			if status_nagios > /dev/null; then
				printf "Reloading nagios configuration...\n"
				killproc_nagios HUP
				echo "done"
			else
				$0 stop
				$0 start
			fi
		fi
		;;

	configtest)
		$NagiosBin -vp $NagiosCfgFile
		;;

	*)
		echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig|configtest}"
		exit 1
		;;

esac

# End of this script