Page 1 of 1

Bug Found!!

Posted: Thu Jan 24, 2019 5:21 am
by danniiffxi
Hi Guys

OK so a brief overview. I run Nagios XI 5.5.9 with Core 4.2.4 and a Gearman setup. on CentOS 6.10..

In the 5.5.8 update you addressed the nagios.lock file issue where it was changing the location in the nagios config.

For the most part the bug was fixed but I have found then when an error is made and a config fails the act of rolling back a config is making the change to the nagios.cfg file still, which causes the Nagios service not to start.

running the following command:

Code: Select all

less /var/log/messages | grep .lock 
I see results of the nagios.lock file having permission denied when trying to write to the /var/run/ location.

In the nagios.cfg file the location is listed like so.

Code: Select all

lock_file=/var/run/nagios.lock 
It should be list like this

Code: Select all

lock_file=/usr/local/nagios/var/nagios.lock 
When I manually adjust the file back to the second location it all works, until the next config failure where it reverts once again to /var/run/.. Can this please be addressed. it's an easy fix but can be very annoying as it causes downtime and breaks in our graphing etc..

If you need more info please let me know.

Re: Bug Found!!

Posted: Thu Jan 24, 2019 12:07 pm
by lmiltchev
The new location of the nagios lock from now on will be:

Code: Select all

lock_file=/var/run/nagios.lock
So, you could try to revert to the "old" location by modifying the path is several files, e.g. /etc/init.d/nagios, /usr/local/nagios/etc/nagios.cfg, /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh, and /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh. However, I would recommend that you use the "/var/run/nagios.lock". The issue with nagios service not being able to start should be resolved in 5.5.9. Perhaps, for some reason, your /etc/init.d/nagios file didn't get updated.

Try adding the following code to the /etc/init.d/nagios (around line 198):
# See how we were called.
case "$1" in

start)
echo -n "Starting nagios: "

check_config

if test -f $NagiosRunFile; then
NagiosPID=`head -n 1 $NagiosRunFile`
if status_nagios; then
echo "another instance of nagios is already running."
exit 0
fi
fi

su $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
rm -f $NagiosCommandFile
touch $NagiosRunFile

# We need to set permissions on old Core 4.2.4 (which is what we
# have people use to use mod_gearman for now)
if $NagiosBin | grep -q "Nagios Core 4.2.4" ; then
chown $NagiosUser:$NagiosGroup $NagiosRunFile
fi


$NagiosBin -d $NagiosCfgFile

echo "done."
;;
The path in all of the above mentioned files should be the same (/var/run/nagios.lock), and after stopping the nagios service, any "left over" locks should be deleted before starting it.

Code: Select all

rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
Let us know if this helped.

Re: Bug Found!!

Posted: Thu Jan 24, 2019 12:14 pm
by danniiffxi
Thank you lmiltchev

Typo on my part, sorry. I am running 2 instances at the moment, our new one is on 5.5.9. This thread relates to the older instance which is on 5.5.8. I will update it tomorrow.

Many thanks

Re: Bug Found!!

Posted: Thu Jan 24, 2019 12:33 pm
by lmiltchev
Sure, let us know if you have any further questions. Thank you!

Re: Bug Found!!

Posted: Thu Jan 31, 2019 9:13 am
by danniiffxi
Hi lmiltchev

So I upgraded to XI 5.5.9 the other day. Today however I had a config fail when removing an old item and had to rollback to the last known good configuration. This once again caused the lock file location to move to /var/run/ but as you can see below Nagios dose not have permission to access that location by default.

Code: Select all

[root@nagiosp01 ~]# less /var/log/messages | grep .lock
Jan 31 13:19:42 nagiosp01 nagios: Failed to obtain lock on file /var/run/nagios.lock: Permission denied
Jan 31 13:23:25 nagiosp01 nagios: Failed to obtain lock on file /var/run/nagios.lock: Permission denied
Jan 31 13:27:12 nagiosp01 nagios: Failed to obtain lock on file /var/run/nagios.lock: Permission denied
I then had to vi into nagios.cfg and change the location once again to the following to get it working again.

Code: Select all

lock_file=/usr/local/nagios/var/nagios.lock
I guess a workaround would be to chmod the /var/run/ directory so that the Nagios account can access it. would this be the recommend course of action?

Re: Bug Found!!

Posted: Thu Jan 31, 2019 10:38 am
by lmiltchev
I guess a workaround would be to chmod the /var/run/ directory so that the Nagios account can access it. would this be the recommend course of action?
The "new" nagios init file should take care of the permissions issue, unless the code below hasn't been added to the file:

Code: Select all

 # We need to set permissions on old Core 4.2.4 (which is what we
# have people use to use mod_gearman for now)
if $NagiosBin | grep -q "Nagios Core 4.2.4" ; then
chown $NagiosUser:$NagiosGroup $NagiosRunFile
fi
Can you post the entire /etc/init.d/nagios file on the forum?

Also, run the following command, and show the output:

Code: Select all

grep 'lock_file' /usr/local/nagios/etc/nagios.cfg /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh

Re: Bug Found!!

Posted: Tue Feb 05, 2019 10:43 am
by danniiffxi
Hi lmiltchev

Sorry for the delay, I have had a few days off. Here is the file requested, I renamed it to nagios.txt as i was unable to upload it without an extension.

Re: Bug Found!!

Posted: Tue Feb 05, 2019 11:10 am
by lmiltchev
You have the piece of code that you need in the nagios init file (for setting the correct permissions to the lock file):

Code: Select all

    # We need to set permissions on old Core 4.2.4 (which is what we
    # have people use to use mod_gearman for now)
    if $NagiosBin | grep -q "Nagios Core 4.2.4" ; then
    chown $NagiosUser:$NagiosGroup $NagiosRunFile
    fi
which is good. However, you need to set the correct path... You still have the "old" path. Stop nagios and change this:

Code: Select all

#NagiosRunFile=/var/run/nagios.lock
NagiosRunFile=/usr/local/nagios/var/nagios.lock
to this:

Code: Select all

NagiosRunFile=/var/run/nagios.lock
Save, and exit.

Also, make sure that you have the SAME path in all of the files, listed below.

- /usr/local/nagios/etc/nagios.cfg
- /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
- /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh

To check if you have the correct path, run the command below:

Code: Select all

grep 'lock_file' /usr/local/nagios/etc/nagios.cfg /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh /usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
After you fix all of the paths, remove any "left over" lock files that you see in /var/run or /usr/local/nagios/var and start nagios.