nagios not working properly

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

nagios not working properly

Post by prasa880 »

Hi ,

i upgraded nagios 4.3.2 to 4.3.4 after that nagios not working properly. when i check the status of nagios "nagios is not running" but it is monitoring.
in logs throwing " prd-nagios nagios: job 1143 (pid=18860): read() returned error 11" errors. not able stop the nagios while stopping "Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] done. process not being killed.

thanks in advance

regards
Prasanth
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: nagios not working properly

Post by dwhitfield »

What's the output of ps -aef | grep nagios.cfg? It sounds to me like you have a couple of different nagios processes running.

That error 11 is usually not actually an issue. Are there any other errors?
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

Re: nagios not working properly

Post by prasa880 »

output:
[root@bbnlnagios1 ~]# ps -aef | grep nagios.cfg
root 22945 22893 0 09:29 pts/0 00:00:00 grep nagios.cfg
nagios 24907 1 0 Nov26 ? 00:24:49 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24913 24907 0 Nov26 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
kyang

Re: nagios not working properly

Post by kyang »

Is SELinux still disabled?

Code: Select all

sestatus
Could you also run this and show us the output?

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Is it the same error when you try restarting nagios?

Code: Select all

service nagios restart
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

Re: nagios not working properly

Post by prasa880 »

1.SELinux status: disabled
#####################################################
2.
[root@bbnlnagios1 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 1632 services.
Checked 610 hosts.
Checked 37 host groups.
Checked 5 service groups.
Checked 217 contacts.
Checked 22 contact groups.
Checked 63 commands.
Checked 5 time periods.
Checked 498 host escalations.
Checked 2 service escalations.
Checking for circular paths...
Checked 610 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

Re: nagios not working properly

Post by prasa880 »

still i am seeing......this error while restarting nagios service

[root@bbnlnagios1 ~]# service nagios restart
Running configuration check...
Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
done.
Starting nagios: done.

AND

Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'!

The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.

An error occurred while attempting to commit your command for processing.

Return from whence you came

------------------------
whenever restarting nagios using frontend.............i am getting above error

Thanks in advance
Regards
prasanth
kyang

Re: nagios not working properly

Post by kyang »

Could you run these commands and post the output?

Code: Select all

service nagios status

Code: Select all

ls -l /usr/local/nagios/var/rw

Code: Select all

cat /etc/group|grep nag
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

Re: nagios not working properly

Post by prasa880 »

hi, find out the output
service nagios status

nagios is not running


ls -l /usr/local/nagios/var/rw
total 4
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh


~]# cat /etc/group|grep nag
nagios:x:500:
nagcmd:x:501:nagios,apache


Regards
Prasanth
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: nagios not working properly

Post by lmiltchev »

Can you run the following commands and show the output?

Code: Select all

uname -a
cat /etc/*release
What do you have in the "cmd" directory? Normally, you should have only the nagios.cmd and nagios.qh files in the "/usr/local/nagios/var/rw" directory...
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh

Code: Select all

ls -la /usr/local/nagios/var/rw/cmd
Add the nagios user to the nagios group:

Code: Select all

useradd -g nagios nagios
Please post the /etc/init.d/nagios file for a review.
Be sure to check out our Knowledgebase for helpful articles and solutions!
prasa880
Posts: 12
Joined: Sun Sep 20, 2015 9:11 am

Re: nagios not working properly

Post by prasa880 »

Hi, find the output
1.uname -a

Linux hostname.test.in 2.6.32-696.6.3.el6.x86_64 #1 SMP Wed Jul 12 14:17:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

2.cat /etc/*release

CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)

3.cmd directory, not relealted with nagios , testing purpose i have created.

4. useradd -g nagios nagios
useradd: user 'nagios' already exists

5.
[root@bbnlnagios1 cmd]# cat /etc/init.d/nagios

Code: Select all

#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
# processname: nagios
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
#  - setup for autoconf
#  - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
#  - Added configuration info for use with RedHat's chkconfig tool
#    per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
#  - added variable for nagios/var directory
#  - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
#  - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
#  - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
#  - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
#              used to provide network services status.
#
### BEGIN INIT INFO
# Provides:		nagios
# Required-Start:	$local_fs $syslog $network
# Required-Stop:	$local_fs $syslog $network
# Default-Start:	2 3 4 5
# Default-Stop:		0 1 6
# Short-Description:	Starts and stops the Nagios monitoring server
# Description:		Starts and stops the Nagios monitoring server
### END INIT INFO

# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=/var/run/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"

# Source function library
# Some *nix do not have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
	. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
	. /etc/init.d/functions
elif [ -f /lib/lsb/init-functions ]; then
	. /lib/lsb/init-functions
fi

# Load any extra environment variables for Nagios and its plugins.
if test -f /etc/sysconfig/nagios; then
	. /etc/sysconfig/nagios
fi

# Automate addition of RAMDISK based on environment variables
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
	ramdisk=`mount |grep "${RAMDISK_DIR} type tmpfs"`
	if [ "$ramdisk"X == "X" ]; then
		mkdir -p -m 0755 ${RAMDISK_DIR}
		mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs ${RAMDISK_DIR}
		mkdir -p -m 0755 ${RAMDISK_DIR}/checkresults
		chown -h -R $NagiosUser:$NagiosGroup ${RAMDISK_DIR}
	fi
fi


check_config ()
{
	rm -f "$NagiosCfgtestFile";
	if test -e "$NagiosCfgtestFile"; then
		echo "ERROR: Could not delete '$NagiosCfgtestFile'"
		exit 8
	fi
	if ! su $NagiosUser -c "touch $NagiosCfgtestFile"; then
		echo "ERROR: Could not create or update '$NagiosCfgtestFile'"
		exit 8
	fi

	TMPFILE=$(mktemp /tmp/.configtest.XXXXXXXX)
	$NagiosBin -vp $NagiosCfgFile > "$TMPFILE"
	WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
	ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`

	if test "$WARN" = "0" && test "${ERR}" = "0"; then
		echo "OK - Configuration check verified" > $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	elif test "${ERR}" = "0"; then
		# Write the errors to a file we can have a script watching for.
		echo "WARNING: Warnings in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		/bin/rm "$TMPFILE"
		return 0
	else
		# Write the errors to a file we can have a script watching for.
		echo "ERROR: Errors in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
		egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
		cat "$TMPFILE"
		exit 8
	fi
}


status_nagios ()
{
	if test -x $NagiosCGI/daemonchk.cgi; then
		if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile > /dev/null 2>&1; then return 0; fi
	else
		if ps -p $NagiosPID > /dev/null 2>&1; then return 0; fi
	fi

	return 1
}

printstatus_nagios ()
{
	if status_nagios; then
		echo "nagios (pid $NagiosPID) is running..."
	else
		echo "nagios is not running"
	fi
}

killproc_nagios ()
{
	kill -s "$1" $NagiosPID
}

pid_nagios ()
{
	if test ! -f $NagiosRunFile; then
		echo "No lock file found in $NagiosRunFile"
		exit 1
	fi

	NagiosPID=`head -n 1 $NagiosRunFile`
}

remove_commandfile ()
{
	# Removing a stalled command file, while there are processes trying/waiting to write into it,
	# will deadlock those processes in a blocking open() system call. To allow such processes to
	# die on a broken pipe, the pipe must be opened for reading without actually reading from it,
	# which is what dd does here. To avoid a chicken-egg problem, the pipe is renamed beforehand.
	# In order for the dd to not deadlock when there is no writing process, it is executed in the
	# background in a subshell together with an empty echo to have at least one writing process.
	
	# see http://unix.stackexchange.com/questions/335406/opening-named-pipe-blocks-forever-if-pipe-is-deleted-without-being-connected
	
	if [ -p $NagiosCommandFile ]; then
		mv -f $NagiosCommandFile $NagiosCommandFile~
		(dd if=$NagiosCommandFile~ count=0 2>/dev/null & echo -n "" >$NagiosCommandFile~)
	fi
	
	rm -f $NagiosCommandFile $NagiosCommandFile~
}


# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
    echo "Executable file $NagiosBin not found. Exiting."
    exit 1
fi

# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
    echo "Configuration file $NagiosCfgFile not found. Exiting."
    exit 1
fi

# See how we were called.
case "$1" in

	start)
		echo -n "Starting nagios:"

		if test "$checkconfig" = "true"; then
			check_config
			# check_config exits on configuration errors.
		fi

		if test -f $NagiosRunFile; then
			NagiosPID=`head -n 1 $NagiosRunFile`
			if status_nagios; then
				echo " another instance of nagios is already running."
				exit 0
			fi
		fi

		su $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
		remove_commandfile
		touch $NagiosRunFile
		$NagiosBin -d $NagiosCfgFile
		if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi

		echo " done."
		;;

	stop)
		echo -n "Stopping nagios:"

		pid_nagios
		killproc_nagios TERM

		# now we have to wait for nagios to exit and remove its
		# own NagiosRunFile, otherwise a following "start" could
		# happen, and then the exiting nagios will remove the
		# new NagiosRunFile, allowing multiple nagios daemons
		# to (sooner or later) run - John Sellens
		#echo -n 'Waiting for nagios to exit .'
		for i in {1..90}; do
			if status_nagios > /dev/null; then
				echo -n '.'
				sleep 1
			else
				break
			fi
		done
		if status_nagios > /dev/null; then
			echo ''
			echo 'Warning - nagios did not exit in a timely manner - Killing it!'
			killproc_nagios KILL
		else
			echo ' done.'
		fi

		remove_commandfile
		rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile
		;;

	status)
		pid_nagios
		printstatus_nagios
		;;

	checkconfig)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if [ $? -eq 0 ]; then
			echo " OK."
		else
			echo " CONFIG ERROR!  Check your Nagios configuration."
			exit 1
		fi
		;;

	restart)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		$0 stop
		$0 start
		;;

	reload|force-reload)
		if test "$checkconfig" = "true"; then
			printf "Running configuration check...\n"
			check_config
		fi

		if test ! -f $NagiosRunFile; then
			$0 start
		else
			pid_nagios
			if status_nagios > /dev/null; then
				printf "Reloading nagios configuration...\n"
				killproc_nagios HUP
				echo "done"
			else
				$0 stop
				$0 start
			fi
		fi
		;;

	configtest)
		$NagiosBin -vp $NagiosCfgFile
		;;

	*)
		echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig|configtest}"
		exit 1
		;;

esac

# End of this script
Last edited by tmcdonald on Tue Dec 12, 2017 10:17 am, edited 1 time in total.
Reason: Please use [code][/code] tags around terminal output or file contents
Locked