Monitoring Engine will not start after upgrading to 5.8
Re: Monitoring Engine will not start after upgrading to 5.8
Hi,
We're heading in to a long weekend and monitoring is down: I have 300+ people counting on me / Nagios XI to monitor ~2600 hosts.
We need to make the call soon if this can be fixed, or do we need to rollback.
My recovery plan is to wipe our VM, reinstall CentOS 7, reinstall XI 5.5, and restore from the last backup. If anyone has a better idea, please let me know.
Please let me know if you need anything from the downed 5.8 instance before I have to wipe it clean.
Rob
We're heading in to a long weekend and monitoring is down: I have 300+ people counting on me / Nagios XI to monitor ~2600 hosts.
We need to make the call soon if this can be fixed, or do we need to rollback.
My recovery plan is to wipe our VM, reinstall CentOS 7, reinstall XI 5.5, and restore from the last backup. If anyone has a better idea, please let me know.
Please let me know if you need anything from the downed 5.8 instance before I have to wipe it clean.
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
Hi,
How are you doing?
Please following this KB and see if you need to increase "Max" connection:
https://support.nagios.com/kb/article/n ... s-513.html
I noticed this warning:
I found this page on ulimits settings:
https://serverfault.com/questions/62861 ... n-centos-7
Also, please follow this KB for message queue:
https://support.nagios.com/kb/article.php?id=139
Please also make sure "ndo2db" is NOT running, since Nagios XI 5.8.3 use NDO3:
Please run the below command:
If your "nagios_logentries" is large, please run the below command to truncate it:
Please run the followings to restart all your services:
Best Regards,
Vinh
How are you doing?
Please following this KB and see if you need to increase "Max" connection:
https://support.nagios.com/kb/article/n ... s-513.html
I noticed this warning:
Code: Select all
WARNING: RLIMIT_NPROC is 64090, total max estimated processes is 71016! You should increase your limits (ulimit -u, or limits.conf)
https://serverfault.com/questions/62861 ... n-centos-7
Also, please follow this KB for message queue:
https://support.nagios.com/kb/article.php?id=139
Please also make sure "ndo2db" is NOT running, since Nagios XI 5.8.3 use NDO3:
Code: Select all
systemctl stop ndo2db
Code: Select all
echo "SELECT table_schema as 'Database', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC;" |mysql -t -u root -pnagiosxi
Code: Select all
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start nagios
systemctl start npcd
systemctl start crond
Best Regards,
Vinh
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Vinh,
Message received, I'll start working on this.
Rob
Message received, I'll start working on this.
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
Hi,
Also, were you able to do "Apply Configuration"?
Regards,
Vinh
Also, were you able to do "Apply Configuration"?
Regards,
Vinh
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Vinh,
Yes, I did everything you suggested yesterday including the "apply" and it says it worked; but the nagios service would terminate after about 1 minute (i.e., no change).
I'm working on today's suggestions from you now & will follow up.
Rob
Yes, I did everything you suggested yesterday including the "apply" and it says it worked; but the nagios service would terminate after about 1 minute (i.e., no change).
I'm working on today's suggestions from you now & will follow up.
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Rob,
Great to hear that "Apply Config" did worked.
So, I am assuming that your database connections or some DB tables might be too large.
Best Regards,
Vinh
Great to hear that "Apply Config" did worked.
So, I am assuming that your database connections or some DB tables might be too large.
Best Regards,
Vinh
Re: Monitoring Engine will not start after upgrading to 5.8
Vinh,
On the ulimit issue, regarding the serverfault article you referred to, systemd ignores anything/everything in /etc/security/limits*.
So in /usr/lib/systemd/system/*.service, for which service(s) do I need to create an override.conf to increase the user's nproc ("max user processes")?
Rob
PS, with regard to MariaDB, I increased max_connections from the default (151) to the maximum of 818 and restarted; nagios crashed again shortly after restarting.
On the ulimit issue, regarding the serverfault article you referred to, systemd ignores anything/everything in /etc/security/limits*.
So in /usr/lib/systemd/system/*.service, for which service(s) do I need to create an override.conf to increase the user's nproc ("max user processes")?
Rob
PS, with regard to MariaDB, I increased max_connections from the default (151) to the maximum of 818 and restarted; nagios crashed again shortly after restarting.
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Vinh,
I followed all of the suggestions you made and Nagios still crashes shortly after [re]starting.
This has been a P1 (Priority / Severity One) high-visibility catastrophic production failure for us.
At this point, we have completely run out of time, we simply cannot have production completely down for such a long period of time, now more than 24 hours.
I'll note also that I have not yet heard back from Sales on my questions / request for information.
I did not receive any request for any additional information from the 5.8 production instance; so here's the plan:
1. I need to recover / restore our old 5.5 instance so we have a working monitoring system in place for the holiday weekend.
2. Early next week I will clone our 5.5 instance, leaving our 5.5 production system running, thus creating a stand-alone, separate 5.5 instance where I will where I will re-run the upgrade procedure and we can pick this issue back up.
Rob
I followed all of the suggestions you made and Nagios still crashes shortly after [re]starting.
This has been a P1 (Priority / Severity One) high-visibility catastrophic production failure for us.
At this point, we have completely run out of time, we simply cannot have production completely down for such a long period of time, now more than 24 hours.
I'll note also that I have not yet heard back from Sales on my questions / request for information.
I did not receive any request for any additional information from the 5.8 production instance; so here's the plan:
1. I need to recover / restore our old 5.5 instance so we have a working monitoring system in place for the holiday weekend.
2. Early next week I will clone our 5.5 instance, leaving our 5.5 production system running, thus creating a stand-alone, separate 5.5 instance where I will where I will re-run the upgrade procedure and we can pick this issue back up.
Rob
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Rob,
I talked to Sean, and he think it is NDO3 issue.
let's downgrade to the previous version of ndo2db (instructions below).
Then edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is uncommented:
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Make sure this line is commented:
#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:
Best Regards,
Vinh
I talked to Sean, and he think it is NDO3 issue.
let's downgrade to the previous version of ndo2db (instructions below).
Code: Select all
systemctl stop nagios
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
systemctl enable ndo2db
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
Make sure this line is commented:
#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:
Code: Select all
systemctl start nagios
Best Regards,
Vinh
Re: Monitoring Engine will not start after upgrading to 5.8
Hi Rob,
I'm very sorry for causing too much delay!!
We have confirmed that it is NDO3 issue based on the debug log:
So, please down grade to ndo2db as I have post the instruction in my last replied.
Hope for good new from your!!
Best Regards,
Vinh
I'm very sorry for causing too much delay!!
We have confirmed that it is NDO3 issue based on the debug log:
Code: Select all
[1622088804] NDO-3: Ended acknowledgement thread
[1622088804] NDO-3: Ended flapping thread
[1622088804] NDO-3: Ended statechange thread
[1622088804] NDO-3: Ended event_handler thread
[1622088804] NDO-3: Ended notification thread
[1622088804] NDO-3: Ended timed_event thread
[1622088805] NDO-3: Ended service_check thread
[1622088805] NDO-3: Ended downtime thread
[1622088806] Caught SIGTERM, shutting down...
Hope for good new from your!!
Best Regards,
Vinh