Page 2 of 3
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 12:40 pm
by mikew
I see you are considering caching objects with Squid. What objects specifically are you thinking of caching?
I may be able to help as I spent 10 years cutting my teeth on Squid.
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 12:46 pm
by tmcdonald
I am going to defer to Mike on the Squid questions. I have played with it in a previous life but I would not call myself an expert by any means.
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 12:49 pm
by BanditBBS
mrochelle wrote:When you indicate you turned off the check leveling, are you indicating you disabled auto rescheduling option in the nagios.cfg ?
Yep
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 5:52 pm
by abrist
Well, it looks like auto-rescheduling needs to be reworked (again). Do you only see this behavior on large installs with auto-rescheduling enabled?
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 8:31 pm
by krobertson71
mrochelle wrote:I have not experienced the spiking issue indicated but I'm joining the conversation since I logged in to post the Nagios stops checking since I've experience 3 such incidents over the past weekend up to this morning. As BanditBBS indicated, the load drops to minimal, checks go down to zero. No errors of any kind I can find, logs appear normal. I'm attaching an image shot from this morning 05:31AM the last occurrence. A restart of Nagios gets everything back to normal.
Also for the record, the ndo2db process is ok ( under 30% )during these incidents.
NagiosSS1_12092014_0531am.PNG
Nagios 2014R2.0
CentOS release 6.3
I have this same exact Dashboard setup, just in a differnent order.. little neater. But you know I roll like that.
Good dashboard.
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 9:18 pm
by mrochelle
Thanks for the dashboard review. It helps my monitoring team to just send me a copy if nagios has a problem.
Also, I've turned off the auto rescheduling and will follow up with the results after a few days of observation.
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 11:10 pm
by krobertson71
mrochelle wrote:Thanks for the dashboard review. It helps my monitoring team to just send me a copy if nagios has a problem.
Also, I've turned off the auto rescheduling and will follow up with the results after a few days of observation.
I was just looking at your dashboard again and I noticed you have a Max Service Check Execution time of 2199 seconds! That means you had a check take over 36 minutes to complete.
I would try to find what check that is and when it started, then hung. Could be related to why all your other checks stopped.. this check possibly?
Just a possibility as that is a way excessive Service Execution Time.
Re: Nagios stops checking!!!
Posted: Tue Dec 09, 2014 11:38 pm
by mrochelle
Yes, that is an actual service monitor that can take up to 45 minutes. It is actually an auto update procedure where a particular nagios server host configurations are synchronized with the reference source database of active hosts. Its only 13 monitors of the 11037.
Re: Nagios stops checking!!!
Posted: Wed Dec 10, 2014 11:40 am
by slansing
Yeah, let us know how things look. I've got a couple checks that take a while to come through as well, one being Windows Updates... takes ages...
Re: Nagios stops checking!!!
Posted: Thu Dec 11, 2014 12:57 pm
by BanditBBS
Ok, must not be the auto-recheduling. My schedule keeps emptying and no checks are being performed even with it off. I need help, this is very very bad!
The worst part is, sometimes when it says no checks are happening, I can see them happening when watching a top. But other times there is nothing running, so I can even rely on this:
Capture.PNG
EDIT: Had to restart ndo2db to get that working again:
Code: Select all
[root@iss-chi-nag05 ~]# service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.
Edit #2 - This is the kind of weirdness that just freaks me out. After restarting NDO2DB my server hasn't run this well in ages...even though its been rebooted a couple times very recently. I have even applied changes a few times:
Capture2.PNG