Nagios XI Services not working
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Nagios XI Services not working
Hi,
for some strange reason the systemctl part of starting the nagios service appears to have broken somehow....
i applied a config earlier and nagios just stopped working...
i checked the config - all was ok.
if i start the process via systemctl i can see it starts, all the workers start and then they all close again....
if i start it manually using '/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg' it works perfectly
until i make another config commit and then it dies again
if i do systemctl status nagios when its broken it says its runnign but if i do /etc/init.d/nagios status it says it is not running
does anyone have any clue what could do that?
for some strange reason the systemctl part of starting the nagios service appears to have broken somehow....
i applied a config earlier and nagios just stopped working...
i checked the config - all was ok.
if i start the process via systemctl i can see it starts, all the workers start and then they all close again....
if i start it manually using '/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg' it works perfectly
until i make another config commit and then it dies again
if i do systemctl status nagios when its broken it says its runnign but if i do /etc/init.d/nagios status it says it is not running
does anyone have any clue what could do that?
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
hmmm after looking at the debug log i could see it was dying at the downtime part of loading..... i found i had 200 downtimes queued up from a script i was working on earlier, so have cleaned those up and now it seems to start ok...
seems kind of bad if it crashes just because of that.... hopefully it doesnt reoccur
seems kind of bad if it crashes just because of that.... hopefully it doesnt reoccur
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios XI Services not working
We can leave this open in case it reoccurs.
If it does reoccur, we're probably going to want to look at your profile. *If it does reoccur* can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info).
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
UPDATE: Profile received and shared with techs.
If it does reoccur, we're probably going to want to look at your profile. *If it does reoccur* can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info).
After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
UPDATE: Profile received and shared with techs.
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
Hi, this appears to be happening again.... every time i try to start it manually im getting segmentation fault.....
logs below
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
/var/log/messages
nagios.debug
logs below
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
Code: Select all
Nagios 4.2.4 starting... (PID=45615)
Local time is Tue Jan 24 07:40:37 NZDT 2017
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
wproc: Registry request: name=Core Worker 45616;pid=45616
wproc: Registry request: name=Core Worker 45617;pid=45617
wproc: Registry request: name=Core Worker 45618;pid=45618
wproc: Registry request: name=Core Worker 45619;pid=45619
wproc: Registry request: name=Core Worker 45620;pid=45620
wproc: Registry request: name=Core Worker 45621;pid=45621
wproc: Registry request: name=Core Worker 45624;pid=45624
wproc: Registry request: name=Core Worker 45623;pid=45623
wproc: Registry request: name=Core Worker 45622;pid=45622
wproc: Registry request: name=Core Worker 45626;pid=45626
wproc: Registry request: name=Core Worker 45625;pid=45625
wproc: Registry request: name=Core Worker 45627;pid=45627
Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Segmentation fault/var/log/messages
Code: Select all
Jan 24 07:40:38 dnzbsglnx10 kernel: nagios[45615]: segfault at 201e000 ip 00007f93059f3f8f sp 00007ffe421c0d48 error 6 in libc-2.17.so[7f930596f000+1b6000]nagios.debug
Code: Select all
[1485197203.173397] [001.0] [pid=47827] get_next_host_notification_time()
[1485197203.180189] [001.0] [pid=47827] get_next_host_notification_time()
[1485197203.181502] [001.0] [pid=47827] get_next_host_notification_time()
[1485197203.181936] [001.0] [pid=47827] get_next_host_notification_time()
[1485197203.236109] [001.0] [pid=47827] get_next_service_notification_time()
[1485197203.249485] [001.0] [pid=47827] get_next_service_notification_time()
[1485197203.351490] [001.0] [pid=47827] get_next_service_notification_time()
[1485197203.356700] [001.0] [pid=47827] get_next_service_notification_time()
[1485197203.360962] [001.0] [pid=47827] get_next_service_notification_time()
[1485197203.364135] [001.0] [pid=47827] sort_downtime()-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
renaming retention.dat has allowed it to start manually by using /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg but using systemctl start nagios doesnt work at all
Re: Nagios XI Services not working
Could you post this file so we check and see if it is corrupted in anyway?
Code: Select all
/etc/init.d/nagiosBe sure to check out our Knowledgebase for helpful articles and solutions!
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
below.... usually the service starts fine... it just randomly stops workingtgriep wrote:Could you post this file so we check and see if it is corrupted in anyway?Code: Select all
/etc/init.d/nagios
Code: Select all
#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
# processname: nagios
# File : nagios
#
# Author : Jorge Sanchez Aymar ([email protected])
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <[email protected]>
# - setup for autoconf
# - add reload function
# 1999-08-06 Ethan Galstad <[email protected]>
# - Added configuration info for use with RedHat's chkconfig tool
# per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <[email protected]>
# - added variable for nagios/var directory
# - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <[email protected]>
# - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <[email protected]>
# - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <[email protected]>
# - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
# used to provide network services status.
#
### BEGIN INIT INFO
# Provides: nagios
# Required-Start: $local_fs $syslog $network
# Required-Stop: $local_fs $syslog $network
# Short-Description: Starts and stops the Nagios monitoring server
# Description: Starts and stops the Nagios monitoring server
### END INIT INFO
# Our install-time configuration.
prefix=/usr/local/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosCfgtestFile=${prefix}/var/nagios.configtest
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=${prefix}/var/nagios.lock
NagiosLockDir=/usr/local/nagiosxi/var/subsys
#NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
checkconfig="true"
# Source function library
# Some *nix do not have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
elif [ -f /lib/lsb/init-functions ]; then
. /lib/lsb/init-functions
fi
# Load any extra environment variables for Nagios and its plugins.
if test -f /etc/sysconfig/nagios; then
. /etc/sysconfig/nagios
fi
# Automate addition of RAMDISK based on environment variables
USE_RAMDISK=${USE_RAMDISK:-0}
if test "$USE_RAMDISK" -ne 0 && test "$RAMDISK_SIZE"X != "X"; then
ramdisk=`mount |grep "${RAMDISK_DIR} type tmpfs"`
if [ "$ramdisk"X == "X" ]; then
mkdir -p -m 0755 ${RAMDISK_DIR}
mount -t tmpfs -o size=${RAMDISK_SIZE}m tmpfs ${RAMDISK_DIR}
mkdir -p -m 0755 ${RAMDISK_DIR}/checkresults
chown -R $NagiosUser:$NagiosGroup ${RAMDISK_DIR}
fi
fi
check_config ()
{
TMPFILE=$(mktemp /tmp/.configtest.XXXXXXXX)
$NagiosBin -vp $NagiosCfgFile > "$TMPFILE"
WARN=`grep ^"Total Warnings:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
ERR=`grep ^"Total Errors:" "$TMPFILE" |awk -F: '{print \$2}' |sed s/' '//g`
if test "$WARN" = "0" && test "${ERR}" = "0"; then
echo "OK - Configuration check verified" > $NagiosCfgtestFile
chmod 0644 $NagiosCfgtestFile
chown $NagiosUser:$NagiosGroup $NagiosCfgtestFile
/bin/rm "$TMPFILE"
return 0
elif test "${ERR}" = "0"; then
# Write the errors to a file we can have a script watching for.
echo "WARNING: Warnings in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
chmod 0644 $NagiosCfgtestFile
chown $NagiosUser:$NagiosGroup $NagiosCfgtestFile
/bin/rm "$TMPFILE"
return 0
else
# Write the errors to a file we can have a script watching for.
echo "ERROR: Errors in config files - see log for details: $NagiosCfgtestFile" > $NagiosCfgtestFile
egrep -i "(^warning|^error)" "$TMPFILE" >> $NagiosCfgtestFile
chmod 0644 $NagiosCfgtestFile
chown $NagiosUser:$NagiosGroup $NagiosCfgtestFile
cat "$TMPFILE"
exit 8
fi
}
status_nagios ()
{
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile > /dev/null 2>&1; then return 0; fi
else
if ps -p $NagiosPID > /dev/null 2>&1; then return 0; fi
fi
return 1
}
printstatus_nagios ()
{
if status_nagios; then
echo "nagios (pid $NagiosPID) is running..."
else
echo "nagios is not running"
exit 3
fi
}
killproc_nagios ()
{
kill -s "$1" $NagiosPID
}
pid_nagios ()
{
if test ! -f $NagiosRunFile; then
echo "No lock file found in $NagiosRunFile"
exit 3
fi
NagiosPID=`head -n 1 $NagiosRunFile`
}
# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
echo "Executable file $NagiosBin not found. Exiting."
exit 1
fi
# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
echo "Configuration file $NagiosCfgFile not found. Exiting."
exit 1
fi
# See how we were called.
case "$1" in
start)
echo -n "Starting nagios:"
if test "$checkconfig" = "true"; then
check_config
# check_config exits on configuration errors.
fi
if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo " another instance of nagios is already running."
exit 0
fi
fi
touch $NagiosVarDir/nagios.log $NagiosRetentionFile
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile $NagiosVarDir/nagios.log $NagiosRetentionFile
USER=$NagiosUser G_BROKEN_FILENAMES=1 SSH_TTY=/dev/pts/0 $NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
service snmptt restart &>/dev/null ||:
echo " done."
;;
stop)
echo -n "Stopping nagios:"
pid_nagios
killproc_nagios TERM
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ''
echo 'Warning - nagios did not exit in a timely manner'
else
echo ' done.'
fi
rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
;;
status)
pid_nagios
printstatus_nagios
;;
checkconfig)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
if [ $? -eq 0 ]; then
echo " OK."
else
echo " CONFIG ERROR! Check your Nagios configuration."
exit 1
fi
;;
restart)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
$0 stop
$0 start
;;
reload|force-reload)
if test "$checkconfig" = "true"; then
printf "Running configuration check...\n"
check_config
fi
if test ! -f $NagiosRunFile; then
$0 start
else
pid_nagios
if status_nagios > /dev/null; then
printf "Reloading nagios configuration...\n"
killproc_nagios HUP
echo "done"
else
$0 stop
$0 start
fi
fi
;;
configtest)
$NagiosBin -vp $NagiosCfgFile
;;
*)
echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig|configtest}"
exit 1
;;
esac
# End of this script
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
just happened again on config commit, and the only way to get it to start again was by renaming retention.dat
but sadly that gets rid of any services we have acknowledged, and of course loses all history etc....
but sadly that gets rid of any services we have acknowledged, and of course loses all history etc....
Re: Nagios XI Services not working
The Nagios init script looks fine, thanks for posting it.
Can you post or PM me a bad retention.dat file from your server that causes the segmentation fault so we can view it?
Can you post or PM me a bad retention.dat file from your server that causes the segmentation fault so we can view it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI Services not working
good timing just did it again 
i have pm'd you the retention.dat file
i have pm'd you the retention.dat file