configure_nagios.sh seems to be stuck in a loop
configure_nagios.sh seems to be stuck in a loop
Hi
Everything in the /usr/local/nagiosxi/var/cmdsubsys.log looks fine all I could see was
...........................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
...............................................................
So I did an upgrade from 5.7.1 to 5.8.3 but after the upgrade when I check the same log but now I see that the reconfigure_nagios.sh seems to be in a loop
Attached a file showing the log
In the log I did notice the following line
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 717
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>
So i ran the script
/usr/local/nagiosxi/scripts/repair_databases.sh
For a while it did report back that another process was already running
But then after a while it yet went back to its loop
I don't think this is normal as none of our other systems are showing the same entries in the log file
Thanks
Everything in the /usr/local/nagiosxi/var/cmdsubsys.log looks fine all I could see was
...........................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
............................................................
PROCESSED 0 COMMANDS
...............................................................
So I did an upgrade from 5.7.1 to 5.8.3 but after the upgrade when I check the same log but now I see that the reconfigure_nagios.sh seems to be in a loop
Attached a file showing the log
In the log I did notice the following line
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 717
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p><h3>
So i ran the script
/usr/local/nagiosxi/scripts/repair_databases.sh
For a while it did report back that another process was already running
But then after a while it yet went back to its loop
I don't think this is normal as none of our other systems are showing the same entries in the log file
Thanks
You do not have the required permissions to view the files attached to this post.
Re: configure_nagios.sh seems to be stuck in a loop
Just incase I tried to repair the SQL Database
but now when I check the logs all I see is Another reconfigure process is still running, sleeping....
but now when I check the logs all I see is Another reconfigure process is still running, sleeping....
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: configure_nagios.sh seems to be stuck in a loop
Hi,
Please try running the following command to vacuum and restart postgres.
Then restart the Nagios XI software stack. The commands below are for Cent 7/8 (systems). If they do not work, please let me know the operating system and I can adjust them.
If that doesn't take care of it, please send us the system profile, and we'll review the logs. Thanks! Benjamin
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Please try running the following command to vacuum and restart postgres.
Code: Select all
echo "vacuum;vacuum analyze;vacuum full;"|psql nagiosxi postgres
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start nagios
systemctl start npcd
systemctl start crond
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: configure_nagios.sh seems to be stuck in a loop
Hi Ben
Sorry that did not work. it sat for a long time with the message
Another reconfigure process is running.
But then after a while it finished what it was doing and then got back into its loop again
I tried to Download the System Profile but it keep failing
So I viewed the Profile and copied the text (see attached text Doc)
So I placed the old 5.7.1 back in a live state I do have the 5.8.3 system just powered off at the moment so we can use the application
But I have noticed another issue on the 5.7.1 system
All the systems that were showing in the cmdsubsys.log file I removed all the services and then the hosts.
Applied the Configuration all went ok
recreated the Hosts and then used the Bulk Modification Tool to add all the Services back to these systems
They do not show in the log file any more and in Nagios app the service show Green
But the Hosts themselves are still showing under Pending on the Host Status Summary
Thanks
Sorry that did not work. it sat for a long time with the message
Another reconfigure process is running.
But then after a while it finished what it was doing and then got back into its loop again
I tried to Download the System Profile but it keep failing
So I viewed the Profile and copied the text (see attached text Doc)
So I placed the old 5.7.1 back in a live state I do have the 5.8.3 system just powered off at the moment so we can use the application
But I have noticed another issue on the 5.7.1 system
All the systems that were showing in the cmdsubsys.log file I removed all the services and then the hosts.
Applied the Configuration all went ok
recreated the Hosts and then used the Bulk Modification Tool to add all the Services back to these systems
They do not show in the log file any more and in Nagios app the service show Green
But the Hosts themselves are still showing under Pending on the Host Status Summary
Thanks
You do not have the required permissions to view the files attached to this post.
Re: configure_nagios.sh seems to be stuck in a loop
Hi I got the other issue sorted I needed to delete all the monitors for the specific servers and recreate them again
Delete Services
Delete Hosts
Delete folders corresponding to these servers
recreate the host
create the services
That got everything work again apart from the Reconfiguration running constantly
I just remembered we had to make other changes to get this site working before, as it was originally configured differently than the other systems
Before I built a new server and it was working fine but after I did a restore the system would not work correctly
We ended up having to make the following changes to get it to work so I was wondering if this could be something that is causing the issue
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
In order to get the system working we had to edit these 2 files
/usr/local/nagios/etc/ndo2db.cfg
and change this line:
db_user=nagios
To:
db_user=ndoutils
Then we restarted ndo2db and nagios:
systemctl restart ndo2db
systemctl restart nagios
Then we edited this file:
/usr/local/nagiosxi/html/config.inc.php
and changed this line:
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'nagios',
To this:
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',
The we applied configuration and everything started working properly.
Delete Services
Delete Hosts
Delete folders corresponding to these servers
recreate the host
create the services
That got everything work again apart from the Reconfiguration running constantly
I just remembered we had to make other changes to get this site working before, as it was originally configured differently than the other systems
Before I built a new server and it was working fine but after I did a restore the system would not work correctly
We ended up having to make the following changes to get it to work so I was wondering if this could be something that is causing the issue
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
In order to get the system working we had to edit these 2 files
/usr/local/nagios/etc/ndo2db.cfg
and change this line:
db_user=nagios
To:
db_user=ndoutils
Then we restarted ndo2db and nagios:
systemctl restart ndo2db
systemctl restart nagios
Then we edited this file:
/usr/local/nagiosxi/html/config.inc.php
and changed this line:
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'nagios',
To this:
"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',
The we applied configuration and everything started working properly.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: configure_nagios.sh seems to be stuck in a loop
Hi,
That's what my stock test system has, the user should be ndoutils and the database nagios.
Regards,
Benjamin
That's what my stock test system has, the user should be ndoutils and the database nagios.
Are check results coming in as expected?"ndoutils" => array(
"dbtype" => 'mysql',
"dbserver" => 'localhost',
"user" => 'ndoutils',
"pwd" => 'n@gweb',
"db" => 'nagios',
Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: configure_nagios.sh seems to be stuck in a loop
Hi
I will check swapping the users and see how it goes
But I think this system was configured differently at the time of deployment
When I upgraded these system to CentOS7 with the standard install everything worked ok
But once I did a restore from a backup the only way to get the system working at all was to swap the 2 Users
Everything works fines until I do the upgrade and then the reconfigure script just constantly runs again and again and again
Not sure what you mean about
Are check results coming in as expected?
Can you explain more
Thanks
I will check swapping the users and see how it goes
But I think this system was configured differently at the time of deployment
When I upgraded these system to CentOS7 with the standard install everything worked ok
But once I did a restore from a backup the only way to get the system working at all was to swap the 2 Users
Everything works fines until I do the upgrade and then the reconfigure script just constantly runs again and again and again
Not sure what you mean about
Are check results coming in as expected?
Can you explain more
Thanks
Re: configure_nagios.sh seems to be stuck in a loop
Hi
when I change the line
from
db_user=ndoutils
to
db_user=nagios
When I restart the server and check the cmdsubsys.log I still see taht the reconfigure_nagios.sh is running constantly but now I see the following line as well
MSG: Could not get data for objects. NDO or Core may not be running
So I have placed back the correct user db_user=ndoutils now I do not get that new message but the configure_nagios.sh still runs constantly
Thanks
when I change the line
from
db_user=ndoutils
to
db_user=nagios
When I restart the server and check the cmdsubsys.log I still see taht the reconfigure_nagios.sh is running constantly but now I see the following line as well
MSG: Could not get data for objects. NDO or Core may not be running
So I have placed back the correct user db_user=ndoutils now I do not get that new message but the configure_nagios.sh still runs constantly
Thanks
Re: configure_nagios.sh seems to be stuck in a loop
Hi
I did another test by just upgrading from 5.7.1 to 7.5.2
Even with this small upgrade I am having the same issue
I did another test by just upgrading from 5.7.1 to 7.5.2
Even with this small upgrade I am having the same issue
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: configure_nagios.sh seems to be stuck in a loop
Hi mccrakem,
I'd like to get a system profile to help troubleshoot the error. The fact that you cannot download the system profile suggests an issue with permissions on the system. This is usually an incorrect sudoers file. Please follow the steps in the article below to correct this and then try downloading the profile again.
Nagios XI - Profile Build Failed
If that fails, please run the following commands to generate this from the command line.
Then send the resulting /usr/local/nagiosxi/var/components/profile.zip file.
Also, try to run the following tail command and the Apply Configuration and post the full output. I'd like to see if it's still getting the database error with postgres.
Thanks, Benajmin
I'd like to get a system profile to help troubleshoot the error. The fact that you cannot download the system profile suggests an issue with permissions on the system. This is usually an incorrect sudoers file. Please follow the steps in the article below to correct this and then try downloading the profile again.
Nagios XI - Profile Build Failed
If that fails, please run the following commands to generate this from the command line.
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORTAlso, try to run the following tail command and the Apply Configuration and post the full output. I'd like to see if it's still getting the database error with postgres.
Code: Select all
# Make sure Core is running
systemctl status nagios
tail -f/usr/local/nagiosxi/var/cmdsubsys.log
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!