Nagios brings down host when checks are renabled

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
fastdude7
Posts: 3
Joined: Thu Jul 05, 2012 8:30 pm

Nagios brings down host when checks are renabled

Post by fastdude7 »

Hi

We require that nagios do alot of checks on each server. For example a single sever might have 200+ checks for it. The problem I am facing is, when i need to renable checks for a server i find that nagios sends so many checks that it causes load to be extremly high.

I am reading through the nagios documentation but i have not found something that will actualy limit the number of concurrent checks to a single host. Limiting concurrent checks is not usefull as i am happy for a high amount of concurrent checks to be executed. It is only a problem when a single host gets hammered.

Any ideas on how to reslove this are welcome.

Thanks in advanaced.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios brings down host when checks are renabled

Post by mguthrie »

There isn't a way to do this on a per-host basis. Your best bet would be to spread out the check_interval for all of the service checks on these particular hosts. If Nagios is running these checks as active checks it should space out the checks evenly over the "max_service_check_spread" in the main nagios.cfg file. 200 checks over 15 minutes shouldn't cause much of a CPU drag.
fastdude7
Posts: 3
Joined: Thu Jul 05, 2012 8:30 pm

Re: Nagios brings down host when checks are renabled

Post by fastdude7 »

thank you for the reply mguthrie.

Thinking about the problem more, we have checks like is host up, web server running and these need to be checked frequently and use little cpu on the remote host. We have other services that check the software behind the web server is also working. However request to this is difficult we are talking the order of 100 or more milliseconds per request. With 300 or 400 requests going to a single server It brings it down. However these checks can be less frequent.
I am guessing some solutions might be:

-Can nagios allow some checks to be done at certien intervals or have a longer max_service_check_spread. pointers in the right direction will be very usefull.

-Run two instances of nagios one which fiorces all checks to be done quickly and another to allow the checks to be very spread out.
fastdude7
Posts: 3
Joined: Thu Jul 05, 2012 8:30 pm

Re: Nagios brings down host when checks are renabled

Post by fastdude7 »

After further investigation I can see that when stop all checks including active and passive then turn them all back on nagios skedules all checks only a few sminutes apart(on my test server i had about 100 checks and these alll got skedual over the span of about 2minutes and maybe some seconds). On config reload nagios makes a nice schedule but on disabling then renabling checks nagios doesn't skedual at all rather all checks are one after the other. Have I set somthing up wrong?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Nagios brings down host when checks are renabled

Post by mguthrie »

You shouldn't have to disable / re-enable the checks for this host. I would just increase the check_interval for this host's service checks wherever you can afford to, and set your maximum check window in the main nagios.cfg.

Code: Select all

max_service_check_spread=30
Note that even if you set this value to a high number, Nagios will still honor the check interval for all individual services, so most checks will still run at whatever interval you have defined for it as long as it is inside of the max spread. After restarting Nagios, your best bet will be to leave the check scheduling alone so Nagios can handle the schedule on it's own. When you disable and then re-enable the checks, you actually push all of those checks to the front of the queue.
Locked