Nagios stops checking!!!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
mikew
Posts: 243
Joined: Sun Feb 05, 2012 7:05 pm

Re: Nagios stops checking!!!

Post by mikew »

I see you are considering caching objects with Squid. What objects specifically are you thinking of caching?

I may be able to help as I spent 10 years cutting my teeth on Squid.
Mike Weber

Nagios Training/Consulting
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios stops checking!!!

Post by tmcdonald »

I am going to defer to Mike on the Squid questions. I have played with it in a previous life but I would not call myself an expert by any means.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Nagios stops checking!!!

Post by BanditBBS »

mrochelle wrote:When you indicate you turned off the check leveling, are you indicating you disabled auto rescheduling option in the nagios.cfg ?
Yep
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios stops checking!!!

Post by abrist »

Well, it looks like auto-rescheduling needs to be reworked (again). Do you only see this behavior on large installs with auto-rescheduling enabled?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Nagios stops checking!!!

Post by krobertson71 »

mrochelle wrote:I have not experienced the spiking issue indicated but I'm joining the conversation since I logged in to post the Nagios stops checking since I've experience 3 such incidents over the past weekend up to this morning. As BanditBBS indicated, the load drops to minimal, checks go down to zero. No errors of any kind I can find, logs appear normal. I'm attaching an image shot from this morning 05:31AM the last occurrence. A restart of Nagios gets everything back to normal.
Also for the record, the ndo2db process is ok ( under 30% )during these incidents.
NagiosSS1_12092014_0531am.PNG
Nagios 2014R2.0
CentOS release 6.3
I have this same exact Dashboard setup, just in a differnent order.. little neater. But you know I roll like that.

Good dashboard.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Nagios stops checking!!!

Post by mrochelle »

Thanks for the dashboard review. It helps my monitoring team to just send me a copy if nagios has a problem.
Also, I've turned off the auto rescheduling and will follow up with the results after a few days of observation.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Nagios stops checking!!!

Post by krobertson71 »

mrochelle wrote:Thanks for the dashboard review. It helps my monitoring team to just send me a copy if nagios has a problem.
Also, I've turned off the auto rescheduling and will follow up with the results after a few days of observation.
I was just looking at your dashboard again and I noticed you have a Max Service Check Execution time of 2199 seconds! That means you had a check take over 36 minutes to complete.

I would try to find what check that is and when it started, then hung. Could be related to why all your other checks stopped.. this check possibly?

Just a possibility as that is a way excessive Service Execution Time.
User avatar
mrochelle
Posts: 238
Joined: Fri May 04, 2012 11:20 am
Location: Heart of America

Re: Nagios stops checking!!!

Post by mrochelle »

Yes, that is an actual service monitor that can take up to 45 minutes. It is actually an auto update procedure where a particular nagios server host configurations are synchronized with the reference source database of active hosts. Its only 13 monitors of the 11037.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios stops checking!!!

Post by slansing »

Yeah, let us know how things look. I've got a couple checks that take a while to come through as well, one being Windows Updates... takes ages...
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Nagios stops checking!!!

Post by BanditBBS »

Ok, must not be the auto-recheduling. My schedule keeps emptying and no checks are being performed even with it off. I need help, this is very very bad!

The worst part is, sometimes when it says no checks are happening, I can see them happening when watching a top. But other times there is nothing running, so I can even rely on this:
Capture.PNG
EDIT: Had to restart ndo2db to get that working again:

Code: Select all

[root@iss-chi-nag05 ~]# service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.
Edit #2 - This is the kind of weirdness that just freaks me out. After restarting NDO2DB my server hasn't run this well in ages...even though its been rebooted a couple times very recently. I have even applied changes a few times:
Capture2.PNG
You do not have the required permissions to view the files attached to this post.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked