ndo2db Hogging ALL the CPU

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: ndo2db Hogging ALL the CPU

Post by mrochelle »

I have a very similar problem and was curious to the resolution found for this problem?
Marcus
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: ndo2db Hogging ALL the CPU

Post by sreinhardt »

Couple of the items that people have found are:

giant tables for alerts or logs, truncating tables should resolve this.
stalking might be turned on for hosts\services (fairly uncommon), this will generate tons of logs and alerts, quickly causing issues with both nagios and ndo. Removing stalking would resolve that.
offloading the db is a resolve on its own that often solves the issue entirely
One final option, would be to offload ndo2db as well. Nagios and ndo can talk just fine over tcp sockets, and is fully configurable within the ndo2db.cfg and ndomod.cfg files as needed.

Unfortunately this is somewhat unique to each install.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: ndo2db Hogging ALL the CPU

Post by mikew »

Update: the problem still exists and is creating great concen over resource usage for customer.

Just to verify stalking is not on. Offloading is of course and option but I wish we knew the cause as the customer does not want to offload the database or ndo2db.

Question:
What would cause giant tables for alerts or logs and what would be the specific process to truncate the tables?

Ultimately, I really want to know what has caused this issue as it is not happening with many other installs that I have seen.
Mike Weber

Nagios Training/Consulting
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: ndo2db Hogging ALL the CPU

Post by mrochelle »

I have to agree with Mike, of the 28 Nagios production servers I'm managing, only 2 seem to have this problem. However, let me add that after the last update to 2014R1.4 the frequency of occurrence of this issue dropped substantially. ( From 2 or 3 times a week per each server to 1 or twice every 2 weeks.) On one server I can see in the graph of scheduled events over time the checks slowly move toward all occurring at once but are auto reset and spread evenly. However, if conditions are just right during the 1 or 2 times it occurs, it is not able to recover and a restart of Nagios with the option "use_retained_scheduling_info=0" will recover the server. Based on this experience, I tend to believe there may be some potential glitch with the code that handles auto scheduling or that spreads the scheduling of checks evenly. ( I welcome any recommended tweaks to test this hypothesis?)
I did open a support case on this problem previously, and it was resolved at the time by rolling back to an early backup archive.
I will make my system available anytime to any nagios support personnel should they desire to investigate further.
Marcus
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: ndo2db Hogging ALL the CPU

Post by lmiltchev »

We've noticed scheduling issues with some customers, who had check interval set very low (1 or 2 min). Checks would get pushed forward (rescheduled), and the last check would not update. This was not necessarily accompanied by high load though. We were able to recreate the issue in house. Our developers are looking into this, but for now, here's what can be done as a "workaround".

1. Make sure that the "auto_rescheduling_window" is set LOWER than the smallest check interval.

For example, if your check interval is 1 min, you can set "auto_rescheduling_window" in the nagios.cfg to 45 sec.

Code: Select all

auto_rescheduling_window=45
2. Make sure that "auto_rescheduling_interval" is lower than auto_rescheduling_window. For example:

Code: Select all

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
This may fix the rescheduling issues in Nagios Core 4 when check interval is set low.

These issues may or may not be related but I would appreciate any feedback from people who tried this.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: ndo2db Hogging ALL the CPU

Post by mrochelle »

Thanks for the feedback. I have made the changes and will keep you posted. Also while load was really not a problem with my experience, I did limit the number of checks below 5 minutes to a very small percent and I did have a noticeable decrease in load.
Marcus
Last edited by mrochelle on Tue Sep 16, 2014 1:19 pm, edited 1 time in total.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: ndo2db Hogging ALL the CPU

Post by mikew »

I reviewed the auto rescheduling and it is all well under the lowest. The loest is 5 min (300) and the auto_rescheduling_interval is 180.

So that does not look like a fit.
Mike Weber

Nagios Training/Consulting
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: ndo2db Hogging ALL the CPU

Post by abrist »

Mike, do you still need help truncating tables or at least identifying if they have grown too large?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: ndo2db Hogging ALL the CPU

Post by lmiltchev »

I guess this is a totally separate issue then. :(

Mike, you meant:

Code: Select all

auto_rescheduling_window=180
not

Code: Select all

auto_rescheduling_interval=180
correct?

Can you show us these three lines?

Code: Select all

auto_reschedule_checks=
auto_rescheduling_interval=
auto_rescheduling_window=
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: ndo2db Hogging ALL the CPU

Post by mikew »

auto_reschedule_checks=1
auto_rescheduling_interval=45
auto_rescheduling_window=180

Yes I would interested to see if I can truncate the tables or at least what could I test to see if that was an issue.
Mike Weber

Nagios Training/Consulting
Locked