nagios not working properly
nagios not working properly
Hi ,
i upgraded nagios 4.3.2 to 4.3.4 after that nagios not working properly. when i check the status of nagios "nagios is not running" but it is monitoring.
in logs throwing " prd-nagios nagios: job 1143 (pid=18860): read() returned error 11" errors. not able stop the nagios while stopping "Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] done. process not being killed.
thanks in advance
regards
Prasanth
i upgraded nagios 4.3.2 to 4.3.4 after that nagios not working properly. when i check the status of nagios "nagios is not running" but it is monitoring.
in logs throwing " prd-nagios nagios: job 1143 (pid=18860): read() returned error 11" errors. not able stop the nagios while stopping "Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] done. process not being killed.
thanks in advance
regards
Prasanth
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: nagios not working properly
What's the output of ps -aef | grep nagios.cfg? It sounds to me like you have a couple of different nagios processes running.
That error 11 is usually not actually an issue. Are there any other errors?
That error 11 is usually not actually an issue. Are there any other errors?
Re: nagios not working properly
output:
[root@bbnlnagios1 ~]# ps -aef | grep nagios.cfg
root 22945 22893 0 09:29 pts/0 00:00:00 grep nagios.cfg
nagios 24907 1 0 Nov26 ? 00:24:49 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24913 24907 0 Nov26 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[root@bbnlnagios1 ~]# ps -aef | grep nagios.cfg
root 22945 22893 0 09:29 pts/0 00:00:00 grep nagios.cfg
nagios 24907 1 0 Nov26 ? 00:24:49 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 24913 24907 0 Nov26 ? 00:00:16 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: nagios not working properly
Is SELinux still disabled?
Could you also run this and show us the output?
Is it the same error when you try restarting nagios?
Code: Select all
sestatus
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Code: Select all
service nagios restart
Re: nagios not working properly
1.SELinux status: disabled
#####################################################
2.
[root@bbnlnagios1 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 1632 services.
Checked 610 hosts.
Checked 37 host groups.
Checked 5 service groups.
Checked 217 contacts.
Checked 22 contact groups.
Checked 63 commands.
Checked 5 time periods.
Checked 498 host escalations.
Checked 2 service escalations.
Checking for circular paths...
Checked 610 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
#####################################################
2.
[root@bbnlnagios1 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.3.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2017-08-24
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 1632 services.
Checked 610 hosts.
Checked 37 host groups.
Checked 5 service groups.
Checked 217 contacts.
Checked 22 contact groups.
Checked 63 commands.
Checked 5 time periods.
Checked 498 host escalations.
Checked 2 service escalations.
Checking for circular paths...
Checked 610 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Re: nagios not working properly
still i am seeing......this error while restarting nagios service
[root@bbnlnagios1 ~]# service nagios restart
Running configuration check...
Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
done.
Starting nagios: done.
AND
Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'!
The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.
An error occurred while attempting to commit your command for processing.
Return from whence you came
------------------------
whenever restarting nagios using frontend.............i am getting above error
Thanks in advance
Regards
prasanth
[root@bbnlnagios1 ~]# service nagios restart
Running configuration check...
Stopping nagios:kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
done.
Starting nagios: done.
AND
Error: Could not stat() command file '/usr/local/nagios/var/rw/nagios.cmd'!
The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands.
An error occurred while attempting to commit your command for processing.
Return from whence you came
------------------------
whenever restarting nagios using frontend.............i am getting above error
Thanks in advance
Regards
prasanth
Re: nagios not working properly
Could you run these commands and post the output?
Code: Select all
service nagios status
Code: Select all
ls -l /usr/local/nagios/var/rw
Code: Select all
cat /etc/group|grep nag
Re: nagios not working properly
hi, find out the output
service nagios status
nagios is not running
ls -l /usr/local/nagios/var/rw
total 4
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh
~]# cat /etc/group|grep nag
nagios500:
nagcmd501:nagios,apache
Regards
Prasanth
service nagios status
nagios is not running
ls -l /usr/local/nagios/var/rw
total 4
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh
~]# cat /etc/group|grep nag
nagios500:
nagcmd501:nagios,apache
Regards
Prasanth
Re: nagios not working properly
Can you run the following commands and show the output?
What do you have in the "cmd" directory? Normally, you should have only the nagios.cmd and nagios.qh files in the "/usr/local/nagios/var/rw" directory...
Add the nagios user to the nagios group:
Please post the /etc/init.d/nagios file for a review.
Code: Select all
uname -a
cat /etc/*release
drwxr-sr-x 2 nagios nagcmd 4096 Nov 7 13:01 cmd
prw-rw---- 1 nagios nagcmd 0 Dec 5 12:42 nagios.cmd
srw-rw---- 1 nagios nagcmd 0 Dec 5 12:41 nagios.qh
Code: Select all
ls -la /usr/local/nagios/var/rw/cmd
Code: Select all
useradd -g nagios nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: nagios not working properly
Hi, find the output
1.uname -a
Linux hostname.test.in 2.6.32-696.6.3.el6.x86_64 #1 SMP Wed Jul 12 14:17:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
2.cat /etc/*release
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
3.cmd directory, not relealted with nagios , testing purpose i have created.
4. useradd -g nagios nagios
useradd: user 'nagios' already exists
5.
[root@bbnlnagios1 cmd]# cat /etc/init.d/nagios
1.uname -a
Linux hostname.test.in 2.6.32-696.6.3.el6.x86_64 #1 SMP Wed Jul 12 14:17:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
2.cat /etc/*release
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
CentOS release 6.9 (Final)
3.cmd directory, not relealted with nagios , testing purpose i have created.
4. useradd -g nagios nagios
useradd: user 'nagios' already exists
5.
[root@bbnlnagios1 cmd]# cat /etc/init.d/nagios
Code: Select all
#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
# processname: nagios
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
# - setup for autoconf
# - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
# - Added configuration info for use with RedHat's chkconfig tool
# per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
# - added variable for nagios/var directory
# - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
# - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
# - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
# - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
# used to provide network services status.
#
### BEGIN INIT INFO
# Provides: nagios
# Required-Start: $local_fs $syslog $network
# Required-Stop: $local_fs $syslog $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Starts and stops the Nagios monitoring server
# Description: Starts and stops the Nagios monitoring server
### END INIT INFO
# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=/var/run/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"
# Source function library
# Some *nix do not have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
elif [ -f /lib/lsb/init-functions ]; then
. /lib/lsb/init-functions
fi
# Load any extra environment variables for Nagios and its plugins.
if test -f /etc/sysconfig/nagios; then
. /etc/sysconfig/nagios
fi
# Automate addition of RAMDISK based on environment variables
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
ramdisk=`mount |grep "${RAMDISK_DIR} type tmpfs"`
if [ "$ramdisk"X == "X" ]; then
mkdir -p -m 0755 ${RAMDISK_DIR}
mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs ${RAMDISK_DIR}
mkdir -p -m 0755 ${RAMDISK_DIR}/checkresults
chown -h -R $NagiosUser:$NagiosGroup ${RAMDISK_DIR}
fi
fi
check_config ()
{
rm -f "$NagiosCfgtestFile";
if test -e "$NagiosCfgtestFile"; then
echo "ERROR: Could not delete '$NagiosCfgtestFile'"
exit 8
fi
if ! su $NagiosUser -c "touch $NagiosCfgtestFile"; then
echo "ERROR: Could not create or update '$NagiosCfgtestFile'"
exit 8
fi
TMPFILE=$(mktemp /tmp/.configtest.XXXXXXXX)
$NagiosBin -vp $NagiosCfgFile > "$TMPFILE"
WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
if test "$WARN" = "0" && test "${ERR}" = "0"; then
echo "OK - Configuration check verified" > $NagiosCfgtestFile
/bin/rm "$TMPFILE"
return 0
elif test "${ERR}" = "0"; then
# Write the errors to a file we can have a script watching for.
echo "WARNING: Warnings in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
/bin/rm "$TMPFILE"
return 0
else
# Write the errors to a file we can have a script watching for.
echo "ERROR: Errors in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
cat "$TMPFILE"
exit 8
fi
}
status_nagios ()
{
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile > /dev/null 2>&1; then return 0; fi
else
if ps -p $NagiosPID > /dev/null 2>&1; then return 0; fi
fi
return 1
}
printstatus_nagios ()
{
if status_nagios; then
echo "nagios (pid $NagiosPID) is running..."
else
echo "nagios is not running"
fi
}
killproc_nagios ()
{
kill -s "$1" $NagiosPID
}
pid_nagios ()
{
if test ! -f $NagiosRunFile; then
echo "No lock file found in $NagiosRunFile"
exit 1
fi
NagiosPID=`head -n 1 $NagiosRunFile`
}
remove_commandfile ()
{
# Removing a stalled command file, while there are processes trying/waiting to write into it,
# will deadlock those processes in a blocking open() system call. To allow such processes to
# die on a broken pipe, the pipe must be opened for reading without actually reading from it,
# which is what dd does here. To avoid a chicken-egg problem, the pipe is renamed beforehand.
# In order for the dd to not deadlock when there is no writing process, it is executed in the
# background in a subshell together with an empty echo to have at least one writing process.
# see http://unix.stackexchange.com/questions/335406/opening-named-pipe-blocks-forever-if-pipe-is-deleted-without-being-connected
if [ -p $NagiosCommandFile ]; then
mv -f $NagiosCommandFile $NagiosCommandFile~
(dd if=$NagiosCommandFile~ count=0 2>/dev/null & echo -n "" >$NagiosCommandFile~)
fi
rm -f $NagiosCommandFile $NagiosCommandFile~
}
# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
echo "Executable file $NagiosBin not found. Exiting."
exit 1
fi
# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
echo "Configuration file $NagiosCfgFile not found. Exiting."
exit 1
fi
# See how we were called.
case "$1" in
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
su $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
remove_commandfile
touch $NagiosRunFile
$NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
;;
stop)
echo -n "Stopping nagios:"
pid_nagios
killproc_nagios TERM
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in {1..90}; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ''
echo 'Warning - nagios did not exit in a timely manner - Killing it!'
killproc_nagios KILL
else
echo ' done.'
fi
remove_commandfile
rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile
;;
status)
pid_nagios
printstatus_nagios
;;
checkconfig)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
if [ $? -eq 0 ]; then
echo " OK."
else
echo " CONFIG ERROR! Check your Nagios configuration."
exit 1
fi
;;
restart)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
$0 stop
$0 start
;;
reload|force-reload)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
if test ! -f $NagiosRunFile; then
$0 start
else
pid_nagios
if status_nagios > /dev/null; then
printf "Reloading nagios configuration...\n"
killproc_nagios HUP
echo "done"
else
$0 stop
$0 start
fi
fi
;;
configtest)
$NagiosBin -vp $NagiosCfgFile
;;
*)
echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig|configtest}"
exit 1
;;
esac
# End of this script
Last edited by tmcdonald on Tue Dec 12, 2017 10:17 am, edited 1 time in total.
Reason: Please use [code][/code] tags around terminal output or file contents
Reason: Please use [code][/code] tags around terminal output or file contents