Page 1 of 2
Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Mon Sep 28, 2020 12:07 pm
by meganwilliford
Hello,
Since upgrading from 5.6.6 to 5.7.2 we've been experiencing some issues with the monitoring engine on multiple of our Nagios XI instances.
The monitoring engine has crashed a few times or when applying configurations the monitoring engine detects problems then after a while will repair itself and report as OK.
Is there any way to troubleshoot this?
Thanks!
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Mon Sep 28, 2020 4:36 pm
by jdunitz
You can look at the following logs, which may provide clues:
Code: Select all
/var/log/messages
/var/log/mariadb/mariadb.log
/usr/local/nagios/var/nagios.log (and other logs in that same directory)
also, the ipcs command will show you if you have hundreds of messages queued up, that's a sign of a problem:
Code: Select all
------ Message Queues --------
key msqid owner perms used-bytes messages
0xef000040 34439168 nagios 600 705536 689
--Jeffrey
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Tue Sep 29, 2020 7:03 am
by meganwilliford
Nothing is sticking out in the logs but it does look like there are a thousands of messages queued up. What problem could that be a sign of and do you know how we can prevent the messages from queuing up?
Code: Select all
------ Message Queues --------
key msqid owner perms used-bytes messages
0xffffffff 0 root 600 0 0
0x000004d2 65537 root 666 0 0
0xdf000200 9371650 nagios 600 0 0
0x02000200 10256387 nagios 600 3590144 3506
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Tue Sep 29, 2020 9:55 am
by meganwilliford
Also I wanted to mention, the message queue results I posted above is only from one of our nagios xi instances. The others that are also having monitoring engine issues do not have any queued up messages.
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Tue Sep 29, 2020 5:04 pm
by ssax
Please PM me a copy of your profile from each XI server, you can download it from Admin > System Profile > Download Profile button.
XI 5.7+ should not use the kernel message queue unless you downgraded NDO3 back to NDO2DB to resolve an issue.
What is the output of this command?
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
mysql -uroot -pnagiosxi -h 127.0.0.1 -P 3306 nagios -e "desc nagios_hoststatus;desc nagios_servicestatus;"
If you run this tail command run for a few minutes do you see any errors pop up? (PM me the output)
Code: Select all
tail -Fn0 /usr/local/nagios/var/nagios.log /usr/local/nagiosxi/var/cmdsubsys.log /usr/local/nagiosxi/var/eventman.log
Please PM your
/usr/local/nagios/var/nagios.log as well.
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Wed Sep 30, 2020 3:39 pm
by meganwilliford
PM sent!
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Thu Oct 01, 2020 6:18 am
by meganwilliford
This morning I was able to watch the monitoring engine crash and it was at the exact time our backups are scheduled for (0400 PT).
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Thu Oct 01, 2020 5:23 pm
by ssax
I was literally going to ask you that, I didn't see anything in your profiles.
Are they XI scheduled backups or 3rd party backups?
If it's an XI backup, please send these files:
Code: Select all
/usr/local/nagiosxi/var/components/scheduledbackups.log
/etc/php.ini
Additionally, please send the output of this command so we can check your DB tables:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Fri Oct 02, 2020 8:21 am
by meganwilliford
PM sent!
Re: Monitoring Engine crashing (Nagios XI 5.7.2)
Posted: Fri Oct 02, 2020 10:44 am
by ssax
Reply sent.