mrochelle wrote:I know how you feel. Fortunately for me, my problem server is not my primary nagios system but is critical for my network team. It only has 2500 monitors and plenty of hardware to spare. I'm considering rebuilding another image myself. Meanwhile, since the problem only impacts nagiosXI, I have the users working from the Thruk interface (which is basically nagios core) so the problem is transparent to them. While I'm not recommending this by any means, I have cronned a restart of the ndo2db process every 10 mins to keep the nagiosXI interface updated. This is just a band aid until I find a permanent fix or a patch for this problem is released.
Wow, every 10 minutes? You sound like you're having as bad a time as me! Unfortunately this is my main server. I think I may push up the offloading ndo and mysql from Wednesday night to NOW! lol
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Yea, I 've spent quite a bit of man hours watching this problem, adjusting the scheduling, running debug on ndo2db without any success. Nagios appears to have some problem with moving the monitor results into the MySQL db. My problem system has plenty of resource available, 4 core CPU, 8GB memory, 60GB drive and no I/O constraints. The load average rarely exceeds .80. Log files are normal. This ndo2db is a paradox for me.
BanditBBS wrote: I think I may push up the offloading ndo and mysql from Wednesday night to NOW! lol
This has been reported to absolutely help. Eric[1] is hard at work on ndo, take a look at the commits (Showing with 2,161 additions and 2,795 deletions): https://github.com/NagiosEnterprises/nd ... its/master
I think a beta is on the way . . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
BanditBBS wrote: I think I may push up the offloading ndo and mysql from Wednesday night to NOW! lol
This has been reported to absolutely help. Eric[1] is hard at work on ndo, take a look at the commits (Showing with 2,161 additions and 2,795 deletions): https://github.com/NagiosEnterprises/nd ... its/master
I think a beta is on the way . . . .
Geez, no need to brag, I believe he works
My DB and NDO are now offloaded. I will update in a couple days if my worries are all gone. I do have a few questions that I will ask in a new thread.
EDIT 1: My DB/NDO server has a ZERO, and yes I mean ZERO, that's 0 load since running. My Nagios server has a 2-4 load since doing it, which I expect due to the 1000-3500 checks per minute and the type of checks. Its way to early to call this resolved for me, but I had to report on my great first impression.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Good to hear, keep us posted. The offload has worked for others, but is not a true ndo fix.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
DB/NDO server, still a 0.xx load and running great. The bad thing is, the Nagios server load is still spiking, about 9am the load spiked and it is fluctuating between 5 and 200. This can no longer be blamed on ndo2db now, which sucks. No clue what can be doing this besides maybe 25+ people all keeping very busy dashboard up on screen perhaps. Haven't seen the schedule stop yet, which is good.
Edit #1 - Just saw ndo2db spike to 100% CPU and of course XI wasn't being updated for the approx 2 minutes that was going on. It fell back down and everything continues working
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
I already had the mysql one disabled. Just disabled the postgresql one for testing purposes. Lets see if that makes a difference. I also made a few other adjustments and even turned back on the experimental rescheduling...just about every time we apply changes I'm testing some other setting change, just to see.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
So, my ndo2db is still going to 100% every so often but only for a minute or two. My bigger issue is the load spiking on my Nagios server ever since upgrading to 2014r2.0. I was initially blaming ndo, but as I stated, can't do that anymore now that it is offloaded. Check out this graph(I installed 2.0 on the 4th):
Capture.PNG
Feel free to separate this post into another thread if you think it should be
You do not have the required permissions to view the files attached to this post.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github