Page 1 of 2

configure_nagios.sh seems to be stuck in a loop

Posted: Tue Apr 27, 2021 7:39 am
by mccrakem
Hi
Everything in the /usr/local/nagiosxi/var/cmdsubsys.log looks fine all I could see was

...........................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
...............................................................

So I did an upgrade from 5.7.1 to 5.8.3 but after the upgrade when I check the same log but now I see that the reconfigure_nagios.sh seems to be in a loop
Attached a file showing the log

In the log I did notice the following line
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 717
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>

So i ran the script
/usr/local/nagiosxi/scripts/repair_databases.sh

For a while it did report back that another process was already running

But then after a while it yet went back to its loop

I don't think this is normal as none of our other systems are showing the same entries in the log file

Thanks

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Tue Apr 27, 2021 7:49 am
by mccrakem
Just incase I tried to repair the SQL Database
but now when I check the logs all I see is Another reconfigure process is still running, sleeping....

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Tue Apr 27, 2021 3:39 pm
by benjaminsmith
Hi,

Please try running the following command to vacuum and restart postgres.

Code: Select all

echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
Then restart the Nagios XI software stack. The commands below are for Cent 7/8 (systems). If they do not work, please let me know the operating system and I can adjust them.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start nagios 
systemctl start npcd
systemctl start crond
If that doesn't take care of it, please send us the system profile, and we'll review the logs. Thanks! Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Wed Apr 28, 2021 5:10 am
by mccrakem
Hi Ben

Sorry that did not work. it sat for a long time with the message
Another reconfigure process is running.

But then after a while it finished what it was doing and then got back into its loop again

I tried to Download the System Profile but it keep failing
So I viewed the Profile and copied the text (see attached text Doc)

So I placed the old 5.7.1 back in a live state I do have the 5.8.3 system just powered off at the moment so we can use the application
But I have noticed another issue on the 5.7.1 system

All the systems that were showing in the cmdsubsys.log file I removed all the services and then the hosts.
Applied the Configuration all went ok
recreated the Hosts and then used the Bulk Modification Tool to add all the Services back to these systems
They do not show in the log file any more and in Nagios app the service show Green

But the Hosts themselves are still showing under Pending on the Host Status Summary

Thanks

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Wed Apr 28, 2021 10:11 am
by mccrakem
Hi I got the other issue sorted I needed to delete all the monitors for the specific servers and recreate them again
Delete Services
Delete Hosts
Delete folders corresponding to these servers
recreate the host
create the services

That got everything work again apart from the Reconfiguration running constantly

I just remembered we had to make other changes to get this site working before, as it was originally configured differently than the other systems

Before I built a new server and it was working fine but after I did a restore the system would not work correctly
We ended up having to make the following changes to get it to work so I was wondering if this could be something that is causing the issue

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

In order to get the system working we had to edit these 2 files

/usr/local/nagios/etc/ndo2db.cfg

and change this line:

db_user=nagios

To:

db_user=ndoutils



Then we restarted ndo2db and nagios:

systemctl restart ndo2db
systemctl restart nagios

Then we edited this file:

/usr/local/nagiosxi/html/config.inc.php

and changed this line:

"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'nagios',

To this:

"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',



The we applied configuration and everything started working properly.

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Wed Apr 28, 2021 1:06 pm
by benjaminsmith
Hi,

That's what my stock test system has, the user should be ndoutils and the database nagios.
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',
"pwd" => 'n@gweb',
"db" => 'nagios',
Are check results coming in as expected?

Regards,
Benjamin

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Wed Apr 28, 2021 1:49 pm
by mccrakem
Hi
I will check swapping the users and see how it goes
But I think this system was configured differently at the time of deployment

When I upgraded these system to CentOS7 with the standard install everything worked ok
But once I did a restore from a backup the only way to get the system working at all was to swap the 2 Users

Everything works fines until I do the upgrade and then the reconfigure script just constantly runs again and again and again

Not sure what you mean about
Are check results coming in as expected?
Can you explain more

Thanks

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Thu Apr 29, 2021 7:14 am
by mccrakem
Hi

when I change the line
from
db_user=ndoutils
to
db_user=nagios

When I restart the server and check the cmdsubsys.log I still see taht the reconfigure_nagios.sh is running constantly but now I see the following line as well
MSG: Could not get data for objects. NDO or Core may not be running

So I have placed back the correct user db_user=ndoutils now I do not get that new message but the configure_nagios.sh still runs constantly

Thanks

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Thu Apr 29, 2021 9:05 am
by mccrakem
Hi
I did another test by just upgrading from 5.7.1 to 7.5.2
Even with this small upgrade I am having the same issue

Re: configure_nagios.sh seems to be stuck in a loop

Posted: Thu Apr 29, 2021 10:34 am
by benjaminsmith
Hi mccrakem,

I'd like to get a system profile to help troubleshoot the error. The fact that you cannot download the system profile suggests an issue with permissions on the system. This is usually an incorrect sudoers file. Please follow the steps in the article below to correct this and then try downloading the profile again.

Nagios XI - Profile Build Failed

If that fails, please run the following commands to generate this from the command line.

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send the resulting /usr/local/nagiosxi/var/components/profile.zip​ file.

Also, try to run the following tail command and the Apply Configuration and post the full output. I'd like to see if it's still getting the database error with postgres.

Code: Select all

# Make sure Core is running
systemctl status nagios
tail -f/usr/local/nagiosxi/var/cmdsubsys.log
Thanks, Benajmin