Nagios Scheduling Issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Nagios Scheduling Issue

Post by rajasegar »

This is again related to the same issue discussed in
http://support.nagios.com/forum/viewtop ... 16&t=32043

It was ok for a while after we offloaded all to ram disk.
However the issue came back.
It is extremely frustrating to baby sit Nagios whole day because the scheduling will just go down to about 10.

Disabled mod_gearmand. Still same issue.
CPU resources, memory IO all ok.
FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?

Would appreciate a fast resolution to this issue.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios Scheduling Issue

Post by lmiltchev »

FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?
We don't have an "official" document on this, but Nagios XI can monitor up to 20000 services (I mean checks in total -> hosts + services), provided the general guidelines on the hardware requirements needed to run XI have been followed.

Having said that, I would like to point out that this is all relative. What we are talking about here is a clean, "vanilla" setup, mixture of active and passive checks, fast hard drives, etc. Each environment is different though. If you are running mostly or only active checks, in you have lots of CPU intensive checks (vmware, snmp, perl scripts, etc.), the performance will suffer.

If you have done everything that you could to tweak your configs, and boost the performance but you are still having issues, I would recommend adding another XI instance (splitting your existing XI instance).
Be sure to check out our Knowledgebase for helpful articles and solutions!
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios Scheduling Issue

Post by rajasegar »

lmiltchev wrote:
FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?
We don't have an "official" document on this, but Nagios XI can monitor up to 20000 services (I mean checks in total -> hosts + services), provided the general guidelines on the hardware requirements needed to run XI have been followed.

Having said that, I would like to point out that this is all relative. What we are talking about here is a clean, "vanilla" setup, mixture of active and passive checks, fast hard drives, etc. Each environment is different though. If you are running mostly or only active checks, in you have lots of CPU intensive checks (vmware, snmp, perl scripts, etc.), the performance will suffer.

If you have done everything that you could to tweak your configs, and boost the performance but you are still having issues, I would recommend adding another XI instance (splitting your existing XI instance).
The CPU, Memory & I/O are all ok.
Only 20000 limit? Is this a typo?
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios Scheduling Issue

Post by jolson »

Only 20000 limit? Is this a typo?
20,000 is an estimate as to the amount of checks a single Nagios 4.x based server can handle without performance-enhancing modifications. I see that you have a RAM disk in place, which would speed things up a bit - and of course there's mod_gearman, which further increases that threshold. I understand that you also have mod_gearman in place.
Is this Nagios limitation?
It's not a 'hard' limitation, but there are boundaries to what a single Nagios server can process - which is why lmiltchev suggested splitting your XI server in two.

From your old thread:
Before Upgrade
Nagios 2014R1.2 with check_gearman: version 1.4_nagios4 running on libgearman 0.25
Everything was scheduling fine.

After upgrade
Nagios 2014R2.6 with check_gearman: version 1.5.0b1 running on libgearman 1.1.8
Serious scheduling issues howering most of the time around 60 and rare bursts around 800.
CPU seems OK, Memory seems ok so it must be some other bottleneck.
NDOUtils ok, no crashed tables in MySQL.
Just to clarify, you didn't increase the amount of checks being done between these upgrades?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios Scheduling Issue

Post by rajasegar »

Jolson:
We add hosts and services almost every day, some days a lot more than others. Sorry I do have the details.

We are currently having 2600 hosts and 21742 services in our first instance of NagiosXI.

Some updates. Did a remote session with Andy a few weeks ago.

After the following was set in nagios.cfg all the scheduling problems went away.
I think it also solved the Apply Configuration problem as adding back sudo for nagios is still ok now.

auto_reschedule_checks=0
use_retained_scheduling_info=1
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios Scheduling Issue

Post by jolson »

I'm glad to hear that you and Andy got this worked out. Is there anything further I can help you with here?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios Scheduling Issue

Post by rajasegar »

jolson wrote:I'm glad to hear that you and Andy got this worked out. Is there anything further I can help you with here?
No. Please close this ticket. Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked