Page 1 of 1

NagiosXI query of death?

Posted: Fri Dec 04, 2020 12:59 pm
by jweijters
At our Nagios environment, NagiosXI 5.6.14 we had an issue today that a query caused the database to have to deliver or proces so much data that the NDO2DB couldn't proces any other events anymore and the ipc message queue went up to 512000 and stayed there.
the query was like:

Code: Select all

SELECT
COUNT(*) as total
FROM nagios_servicestatus
LEFT JOIN nagios_objects as obj1 ON nagios_servicestatus.service_object_id=obj1.object_id
LEFT JOIN nagios_services ON nagios_servicestatus.service_object_id=nagios_services.service_object_id
LEFT JOIN nagios_hosts ON nagios_services.host_object_id=nagios_hosts.host_object_id
WHERE TRUE AND obj1.name1 = 'dmn-bm0212397495' AND nagios_servicestatus.instance_id = '1' AND nagios_servicestatus.service_object_id IN ( 
With after the curly bracket at least 64000 oject_id's.
This query as started from the NagiosXI web interface.

And as we use php-fpm the php-session wasn't removed, so after a reboot the session started the query again, and my Nagios environment hung again.
When I stopped php-fpm, this query was stopped at the database, and nagios came back alive.
removing all php-session files stopped all of my problems.

Can you please help me in what action at the NagiosXI web interface can create such a query?

Kind regards,

Joris Weijters

Re: NagiosXI query of death?

Posted: Fri Dec 04, 2020 5:37 pm
by dchurch
The Nagios XI web interface as a general rule doesn't insert data into the nagios database where the nagios_servicestatus table resides. The process responsible for inserting data into that database is the nagios service, i.e. the service that kicks off the checks and collects the data. This is sometimes called the Monitoring Engine.

Often there can be performance hits introduced by having a crashed table somewhere in the database. I'd suggest running the database repair routine to see if that helps.

Note about NDO2DB issues

Prior to Nagios XI 5.7.x, the Monitoring Engine used a process called NDO2DB to write its data to the database. Problem was that this process could get choked if it received too much data at once, and was resulting in delays to updates in the Nagios XI web interface (e.g. a graph would be out of date no matter how much you refreshed the page). In extreme cases, it would just fail to write the data to the database. NDO2DB uses a socket and relies on IPC to control the flow over the socket.

Nagios 5.7.x solves this issue by forgoing using NDO2DB completely, instead using the a pooled connection opened in a worker thread to perform the query. This is not only faster, but simpler and less error-prone as well.

If you've reached the point where you're monitoring enough hosts that you're experiencing IPC overflow, delays, and lags, I'd suggest you try upgrading to the latest version of Nagios XI.

Re: NagiosXI query of death?

Posted: Mon Dec 07, 2020 4:37 am
by jweijters
Hi dchurce,

As you can see the query is not an INSERT query, but a SELECT query
so what causes this?

your answer is not an answer to my question. Of course we tried to repair the database, It was fine

Nagios 5.7.X is not an answer to our Nagios environment. Upgrading to 5.7.X kill's our Nagios completely.
The Nagios environment is using gearman. As in one of my other topics.
https://support.nagios.com/forum/viewto ... 27#p319427


kind regards,
Joris Weijters

Re: NagiosXI query of death?

Posted: Mon Dec 07, 2020 3:18 pm
by benjaminsmith
Hi Joris,

Let's move this over to a support ticket. Please attach a fresh system profile from your server, so we can start reviewing the logs right away.

To open a support ticket, please visit:
https://support.nagios.com/tickets/

Please reference this thread when opening the ticket. Also, let us know how often is this occurring, and any other issues the system is having?

Thanks,
Benjamin

To download a system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Re: NagiosXI query of death?

Posted: Wed Dec 09, 2020 5:37 pm
by ssax
Locking thread, ticket received, we will continue support through the ticket.

Thank you!