ndo2db Hogging ALL the CPU
Re: ndo2db Hogging ALL the CPU
I have a very similar problem and was curious to the resolution found for this problem?
Marcus
Marcus
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: ndo2db Hogging ALL the CPU
Couple of the items that people have found are:
giant tables for alerts or logs, truncating tables should resolve this.
stalking might be turned on for hosts\services (fairly uncommon), this will generate tons of logs and alerts, quickly causing issues with both nagios and ndo. Removing stalking would resolve that.
offloading the db is a resolve on its own that often solves the issue entirely
One final option, would be to offload ndo2db as well. Nagios and ndo can talk just fine over tcp sockets, and is fully configurable within the ndo2db.cfg and ndomod.cfg files as needed.
Unfortunately this is somewhat unique to each install.
giant tables for alerts or logs, truncating tables should resolve this.
stalking might be turned on for hosts\services (fairly uncommon), this will generate tons of logs and alerts, quickly causing issues with both nagios and ndo. Removing stalking would resolve that.
offloading the db is a resolve on its own that often solves the issue entirely
One final option, would be to offload ndo2db as well. Nagios and ndo can talk just fine over tcp sockets, and is fully configurable within the ndo2db.cfg and ndomod.cfg files as needed.
Unfortunately this is somewhat unique to each install.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: ndo2db Hogging ALL the CPU
Update: the problem still exists and is creating great concen over resource usage for customer.
Just to verify stalking is not on. Offloading is of course and option but I wish we knew the cause as the customer does not want to offload the database or ndo2db.
Question:
What would cause giant tables for alerts or logs and what would be the specific process to truncate the tables?
Ultimately, I really want to know what has caused this issue as it is not happening with many other installs that I have seen.
Just to verify stalking is not on. Offloading is of course and option but I wish we knew the cause as the customer does not want to offload the database or ndo2db.
Question:
What would cause giant tables for alerts or logs and what would be the specific process to truncate the tables?
Ultimately, I really want to know what has caused this issue as it is not happening with many other installs that I have seen.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: ndo2db Hogging ALL the CPU
I have to agree with Mike, of the 28 Nagios production servers I'm managing, only 2 seem to have this problem. However, let me add that after the last update to 2014R1.4 the frequency of occurrence of this issue dropped substantially. ( From 2 or 3 times a week per each server to 1 or twice every 2 weeks.) On one server I can see in the graph of scheduled events over time the checks slowly move toward all occurring at once but are auto reset and spread evenly. However, if conditions are just right during the 1 or 2 times it occurs, it is not able to recover and a restart of Nagios with the option "use_retained_scheduling_info=0" will recover the server. Based on this experience, I tend to believe there may be some potential glitch with the code that handles auto scheduling or that spreads the scheduling of checks evenly. ( I welcome any recommended tweaks to test this hypothesis?)
I did open a support case on this problem previously, and it was resolved at the time by rolling back to an early backup archive.
I will make my system available anytime to any nagios support personnel should they desire to investigate further.
Marcus
I did open a support case on this problem previously, and it was resolved at the time by rolling back to an early backup archive.
I will make my system available anytime to any nagios support personnel should they desire to investigate further.
Marcus
Re: ndo2db Hogging ALL the CPU
We've noticed scheduling issues with some customers, who had check interval set very low (1 or 2 min). Checks would get pushed forward (rescheduled), and the last check would not update. This was not necessarily accompanied by high load though. We were able to recreate the issue in house. Our developers are looking into this, but for now, here's what can be done as a "workaround".
1. Make sure that the "auto_rescheduling_window" is set LOWER than the smallest check interval.
For example, if your check interval is 1 min, you can set "auto_rescheduling_window" in the nagios.cfg to 45 sec.
2. Make sure that "auto_rescheduling_interval" is lower than auto_rescheduling_window. For example:
This may fix the rescheduling issues in Nagios Core 4 when check interval is set low.
These issues may or may not be related but I would appreciate any feedback from people who tried this.
1. Make sure that the "auto_rescheduling_window" is set LOWER than the smallest check interval.
For example, if your check interval is 1 min, you can set "auto_rescheduling_window" in the nagios.cfg to 45 sec.
Code: Select all
auto_rescheduling_window=45
Code: Select all
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
These issues may or may not be related but I would appreciate any feedback from people who tried this.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ndo2db Hogging ALL the CPU
Thanks for the feedback. I have made the changes and will keep you posted. Also while load was really not a problem with my experience, I did limit the number of checks below 5 minutes to a very small percent and I did have a noticeable decrease in load.
Marcus
Marcus
Last edited by mrochelle on Tue Sep 16, 2014 1:19 pm, edited 1 time in total.
Re: ndo2db Hogging ALL the CPU
I reviewed the auto rescheduling and it is all well under the lowest. The loest is 5 min (300) and the auto_rescheduling_interval is 180.
So that does not look like a fit.
So that does not look like a fit.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting
Re: ndo2db Hogging ALL the CPU
Mike, do you still need help truncating tables or at least identifying if they have grown too large?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: ndo2db Hogging ALL the CPU
I guess this is a totally separate issue then.
Mike, you meant:
not
correct?
Can you show us these three lines?
Mike, you meant:
Code: Select all
auto_rescheduling_window=180
Code: Select all
auto_rescheduling_interval=180
Can you show us these three lines?
Code: Select all
auto_reschedule_checks=
auto_rescheduling_interval=
auto_rescheduling_window=
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ndo2db Hogging ALL the CPU
auto_reschedule_checks=1
auto_rescheduling_interval=45
auto_rescheduling_window=180
Yes I would interested to see if I can truncate the tables or at least what could I test to see if that was an issue.
auto_rescheduling_interval=45
auto_rescheduling_window=180
Yes I would interested to see if I can truncate the tables or at least what could I test to see if that was an issue.
Mike Weber
Nagios Training/Consulting
Nagios Training/Consulting