Page 1 of 2

Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Mon Sep 28, 2020 12:07 pm
by meganwilliford
Hello,

Since upgrading from 5.6.6 to 5.7.2 we've been experiencing some issues with the monitoring engine on multiple of our Nagios XI instances.

The monitoring engine has crashed a few times or when applying configurations the monitoring engine detects problems then after a while will repair itself and report as OK.

Is there any way to troubleshoot this?

Thanks!

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Mon Sep 28, 2020 4:36 pm
by jdunitz
You can look at the following logs, which may provide clues:

Code: Select all

/var/log/messages
/var/log/mariadb/mariadb.log
/usr/local/nagios/var/nagios.log (and other logs in that same directory)

also, the ipcs command will show you if you have hundreds of messages queued up, that's a sign of a problem:

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xef000040 34439168   nagios     600        705536       689
--Jeffrey

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Tue Sep 29, 2020 7:03 am
by meganwilliford
Nothing is sticking out in the logs but it does look like there are a thousands of messages queued up. What problem could that be a sign of and do you know how we can prevent the messages from queuing up?

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xffffffff 0          root       600        0            0
0x000004d2 65537      root       666        0            0
0xdf000200 9371650    nagios     600        0            0
0x02000200 10256387   nagios     600        3590144      3506

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Tue Sep 29, 2020 9:55 am
by meganwilliford
Also I wanted to mention, the message queue results I posted above is only from one of our nagios xi instances. The others that are also having monitoring engine issues do not have any queued up messages.

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Tue Sep 29, 2020 5:04 pm
by ssax
Please PM me a copy of your profile from each XI server, you can download it from Admin > System Profile > Download Profile button.

XI 5.7+ should not use the kernel message queue unless you downgraded NDO3 back to NDO2DB to resolve an issue.

What is the output of this command?
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

mysql -uroot -pnagiosxi -h 127.0.0.1 -P 3306 nagios -e "desc nagios_hoststatus;desc nagios_servicestatus;"
If you run this tail command run for a few minutes do you see any errors pop up? (PM me the output)

Code: Select all

tail -Fn0 /usr/local/nagios/var/nagios.log /usr/local/nagiosxi/var/cmdsubsys.log /usr/local/nagiosxi/var/eventman.log
Please PM your /usr/local/nagios/var/nagios.log as well.

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Wed Sep 30, 2020 3:39 pm
by meganwilliford
PM sent!

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Thu Oct 01, 2020 6:18 am
by meganwilliford
This morning I was able to watch the monitoring engine crash and it was at the exact time our backups are scheduled for (0400 PT).

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Thu Oct 01, 2020 5:23 pm
by ssax
I was literally going to ask you that, I didn't see anything in your profiles.

Are they XI scheduled backups or 3rd party backups?

If it's an XI backup, please send these files:

Code: Select all

/usr/local/nagiosxi/var/components/scheduledbackups.log
/etc/php.ini
Additionally, please send the output of this command so we can check your DB tables:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Fri Oct 02, 2020 8:21 am
by meganwilliford
PM sent!

Re: Monitoring Engine crashing (Nagios XI 5.7.2)

Posted: Fri Oct 02, 2020 10:44 am
by ssax
Reply sent.