Alert's history not working after changing concurrent checks

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
uselessid
Posts: 12
Joined: Wed Jul 16, 2014 3:30 pm

Alert's history not working after changing concurrent checks

Post by uselessid »

Hello.

Our nagios (3.5.1) has about 10k services and mostly all of it are active service checks.

We were experiencing high loads at OS and (we dont known if is related) some gaps in random graphics.

So, we followed the instructions to improve nagios's performance (http://nagios.sourceforge.net/docs/3_0/tuning.html) which lead us to change some parameters, including max_concurrent_checks until we came to a balance between OS's load and check latency. But, lowing max_concurrent_checks from 0 (unlimited) to 250 polluted the log file causing alert's history to simply stop (at least for the most part of the time). It keeps logging "Max concurrent service checks (250) has been reached" and it seems to mess the alert's history as the log grows up.

Our check latency were around 170 secs and after the changes it lowered to 5 to 15 secs.

I known that it keeps postponing the checks but it was the best way we found to make active and on-demand checks more responsive.
Last edited by uselessid on Thu Jul 17, 2014 6:33 am, edited 1 time in total.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Alert's history not working after changing concurrent ch

Post by Box293 »

Are you using MRTG?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
uselessid
Posts: 12
Joined: Wed Jul 16, 2014 3:30 pm

Re: Alert's history not working after changing concurrent ch

Post by uselessid »

For general graphics i'm using nagios grapher.

But for nagiostats in specific i'm using MRTG, yes.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Alert's history not working after changing concurrent ch

Post by abrist »

uselessid wrote:For general graphics i'm using nagios grapher.

But for nagiostats in specific i'm using MRTG, yes.
The reason for the lower load is that a ton of your checks are most likely not running anywhere near their scheduled time. Were your issue mostly due to load or io wait?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Alert's history not working after changing concurrent ch

Post by Box293 »

uselessid wrote:But for nagiostats in specific i'm using MRTG, yes.
I asked this because I've seen situations where stale MRTG configuration can lead to a higher load which can cause gaps in graphs.

If you have decomissioned objects in your MRTG config file, delays in MRTG trying to access these non-contactable devices can make MRTG run longer than expected. With about 10k services it could be possible that decomissioned devices may still be in your mrtg configs.

I hope this is of some help.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
uselessid
Posts: 12
Joined: Wed Jul 16, 2014 3:30 pm

Re: Alert's history not working after changing concurrent ch

Post by uselessid »

abrist wrote:
uselessid wrote:For general graphics i'm using nagios grapher.

But for nagiostats in specific i'm using MRTG, yes.
The reason for the lower load is that a ton of your checks are most likely not running anywhere near their scheduled time. Were your issue mostly due to load or io wait?

Mostly related to load.

I know that nagios was postponing the checks when i used a low max_concurrent_checks value but in general the check latency time (active checks) got considerably lower and on-demand checks almost instantaneous.

For our scneario, it seems to be the best setting (to not use unlimited max_concurrent_checks).

I only wish that could be a way to supress those log messages (concurrent checks reached).
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Alert's history not working after changing concurrent ch

Post by lmiltchev »

I only wish that could be a way to supress those log messages (concurrent checks reached).
I don't believe there is a way to suppress these messages. I would recommend focusing on finding out what is causing the high load on the system.
Be sure to check out our Knowledgebase for helpful articles and solutions!
uselessid
Posts: 12
Joined: Wed Jul 16, 2014 3:30 pm

Re: Alert's history not working after changing concurrent ch

Post by uselessid »

The load is possibly due a lot of services being monitored by active checks, about 10k, so there's a lot of concurrent checks.

Besides that, the server's also run the script's checks and, as we monitor remote locations through MPLS/Internet and latency/packet loss may occur, then it happens that some scripts keep hanging waiting for the timeout. Sum it all and you got the high load.

I don't think there's much we can do. Lowing max_concurrent_checks is our best choice!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Alert's history not working after changing concurrent ch

Post by tmcdonald »

Is there any way you could offload some of the checks to be passive? That would help with both the load and the concurrent checks.

Also, maybe try tweaking the check_interval for some non-critical checks to 10, 15, or higher.
Former Nagios employee
uselessid
Posts: 12
Joined: Wed Jul 16, 2014 3:30 pm

Re: Alert's history not working after changing concurrent ch

Post by uselessid »

I've changed the check_interval for the most costly services but it didn't help much.

The load wouldn't be a problem as long it didn't affect the check latency or graphing (don't know exactly if it has something to do with it).
Locked