configure_nagios.sh seems to be stuck in a loop

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi
Everything in the /usr/local/nagiosxi/var/cmdsubsys.log looks fine all I could see was

...........................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
...............................................................

So I did an upgrade from 5.7.1 to 5.8.3 but after the upgrade when I check the same log but now I see that the reconfigure_nagios.sh seems to be in a loop
Attached a file showing the log

In the log I did notice the following line
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 717
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>

So i ran the script
/usr/local/nagiosxi/scripts/repair_databases.sh

For a while it did report back that another process was already running

But then after a while it yet went back to its loop

I don't think this is normal as none of our other systems are showing the same entries in the log file

Thanks
You do not have the required permissions to view the files attached to this post.
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Just incase I tried to repair the SQL Database
but now when I check the logs all I see is Another reconfigure process is still running, sleeping....
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: configure_nagios.sh seems to be stuck in a loop

Post by benjaminsmith »

Hi,

Please try running the following command to vacuum and restart postgres.

Code: Select all

echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
Then restart the Nagios XI software stack. The commands below are for Cent 7/8 (systems). If they do not work, please let me know the operating system and I can adjust them.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start nagios 
systemctl start npcd
systemctl start crond
If that doesn't take care of it, please send us the system profile, and we'll review the logs. Thanks! Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi Ben

Sorry that did not work. it sat for a long time with the message
Another reconfigure process is running.

But then after a while it finished what it was doing and then got back into its loop again

I tried to Download the System Profile but it keep failing
So I viewed the Profile and copied the text (see attached text Doc)

So I placed the old 5.7.1 back in a live state I do have the 5.8.3 system just powered off at the moment so we can use the application
But I have noticed another issue on the 5.7.1 system

All the systems that were showing in the cmdsubsys.log file I removed all the services and then the hosts.
Applied the Configuration all went ok
recreated the Hosts and then used the Bulk Modification Tool to add all the Services back to these systems
They do not show in the log file any more and in Nagios app the service show Green

But the Hosts themselves are still showing under Pending on the Host Status Summary

Thanks
You do not have the required permissions to view the files attached to this post.
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi I got the other issue sorted I needed to delete all the monitors for the specific servers and recreate them again
Delete Services
Delete Hosts
Delete folders corresponding to these servers
recreate the host
create the services

That got everything work again apart from the Reconfiguration running constantly

I just remembered we had to make other changes to get this site working before, as it was originally configured differently than the other systems

Before I built a new server and it was working fine but after I did a restore the system would not work correctly
We ended up having to make the following changes to get it to work so I was wondering if this could be something that is causing the issue

--------------------------------------------------------------------------------------------------------------------------------------------------------------------

In order to get the system working we had to edit these 2 files

/usr/local/nagios/etc/ndo2db.cfg

and change this line:

db_user=nagios

To:

db_user=ndoutils



Then we restarted ndo2db and nagios:

systemctl restart ndo2db
systemctl restart nagios

Then we edited this file:

/usr/local/nagiosxi/html/config.inc.php

and changed this line:

"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'nagios',

To this:

"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',



The we applied configuration and everything started working properly.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: configure_nagios.sh seems to be stuck in a loop

Post by benjaminsmith »

Hi,

That's what my stock test system has, the user should be ndoutils and the database nagios.
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',
"pwd" => 'n@gweb',
"db" => 'nagios',
Are check results coming in as expected?

Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi
I will check swapping the users and see how it goes
But I think this system was configured differently at the time of deployment

When I upgraded these system to CentOS7 with the standard install everything worked ok
But once I did a restore from a backup the only way to get the system working at all was to swap the 2 Users

Everything works fines until I do the upgrade and then the reconfigure script just constantly runs again and again and again

Not sure what you mean about
Are check results coming in as expected?
Can you explain more

Thanks
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi

when I change the line
from
db_user=ndoutils
to
db_user=nagios

When I restart the server and check the cmdsubsys.log I still see taht the reconfigure_nagios.sh is running constantly but now I see the following line as well
MSG: Could not get data for objects. NDO or Core may not be running

So I have placed back the correct user db_user=ndoutils now I do not get that new message but the configure_nagios.sh still runs constantly

Thanks
mccrakem
Posts: 129
Joined: Mon Jun 19, 2017 8:28 am

Re: configure_nagios.sh seems to be stuck in a loop

Post by mccrakem »

Hi
I did another test by just upgrading from 5.7.1 to 7.5.2
Even with this small upgrade I am having the same issue
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: configure_nagios.sh seems to be stuck in a loop

Post by benjaminsmith »

Hi mccrakem,

I'd like to get a system profile to help troubleshoot the error. The fact that you cannot download the system profile suggests an issue with permissions on the system. This is usually an incorrect sudoers file. Please follow the steps in the article below to correct this and then try downloading the profile again.

Nagios XI - Profile Build Failed

If that fails, please run the following commands to generate this from the command line.

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send the resulting /usr/local/nagiosxi/var/components/profile.zip​ file.

Also, try to run the following tail command and the Apply Configuration and post the full output. I'd like to see if it's still getting the database error with postgres.

Code: Select all

# Make sure Core is running
systemctl status nagios
tail -f/usr/local/nagiosxi/var/cmdsubsys.log
Thanks, Benajmin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked