Nagios Scheduling Issue

rajasegar · Post by **rajasegar** » Tue May 12, 2015 3:20 am

This is again related to the same issue discussed in
http://support.nagios.com/forum/viewtop ... 16&t=32043

It was ok for a while after we offloaded all to ram disk.
However the issue came back.
It is extremely frustrating to baby sit Nagios whole day because the scheduling will just go down to about 10.

Disabled mod_gearmand. Still same issue.
CPU resources, memory IO all ok.
FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?

Would appreciate a fast resolution to this issue.

Post by **lmiltchev** » Tue May 12, 2015 10:55 am

FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?

We don't have an "official" document on this, but Nagios XI can monitor up to 20000 services (I mean checks in total -> hosts + services), provided the general guidelines on the hardware requirements needed to run XI have been followed.

Having said that, I would like to point out that this is all relative. What we are talking about here is a clean, "vanilla" setup, mixture of active and passive checks, fast hard drives, etc. Each environment is different though. If you are running mostly or only active checks, in you have lots of CPU intensive checks (vmware, snmp, perl scripts, etc.), the performance will suffer.

If you have done everything that you could to tweak your configs, and boost the performance but you are still having issues, I would recommend adding another XI instance (splitting your existing XI instance).

rajasegar · Post by **rajasegar** » Thu May 14, 2015 12:06 am

lmiltchev wrote:
FYI, we are running all active checks only. About 2050 hosts, 17000 services with checks every 5 minutes.
Is this Nagios limitation?
We don't have an "official" document on this, but Nagios XI can monitor up to 20000 services (I mean checks in total -> hosts + services), provided the general guidelines on the hardware requirements needed to run XI have been followed.

Having said that, I would like to point out that this is all relative. What we are talking about here is a clean, "vanilla" setup, mixture of active and passive checks, fast hard drives, etc. Each environment is different though. If you are running mostly or only active checks, in you have lots of CPU intensive checks (vmware, snmp, perl scripts, etc.), the performance will suffer.

If you have done everything that you could to tweak your configs, and boost the performance but you are still having issues, I would recommend adding another XI instance (splitting your existing XI instance).

The CPU, Memory & I/O are all ok.
Only 20000 limit? Is this a typo?

jolson · Post by **jolson** » Thu May 14, 2015 2:47 pm

Only 20000 limit? Is this a typo?

20,000 is an estimate as to the amount of checks a single Nagios 4.x based server can handle without performance-enhancing modifications. I see that you have a RAM disk in place, which would speed things up a bit - and of course there's mod_gearman, which further increases that threshold. I understand that you also have mod_gearman in place.

Is this Nagios limitation?

It's not a 'hard' limitation, but there are boundaries to what a single Nagios server can process - which is why lmiltchev suggested splitting your XI server in two.

From your old thread:

Before Upgrade
Nagios 2014R1.2 with check_gearman: version 1.4_nagios4 running on libgearman 0.25
Everything was scheduling fine.

After upgrade
Nagios 2014R2.6 with check_gearman: version 1.5.0b1 running on libgearman 1.1.8
Serious scheduling issues howering most of the time around 60 and rare bursts around 800.
CPU seems OK, Memory seems ok so it must be some other bottleneck.
NDOUtils ok, no crashed tables in MySQL.

Just to clarify, you didn't increase the amount of checks being done between these upgrades?

rajasegar · Post by **rajasegar** » Mon Jun 01, 2015 3:33 am

Jolson:
We add hosts and services almost every day, some days a lot more than others. Sorry I do have the details.

We are currently having 2600 hosts and 21742 services in our first instance of NagiosXI.

Some updates. Did a remote session with Andy a few weeks ago.

After the following was set in nagios.cfg all the scheduling problems went away.
I think it also solved the Apply Configuration problem as adding back sudo for nagios is still ok now.

auto_reschedule_checks=0
use_retained_scheduling_info=1

jolson · Post by **jolson** » Mon Jun 01, 2015 9:48 am

I'm glad to hear that you and Andy got this worked out. Is there anything further I can help you with here?

rajasegar · Post by **rajasegar** » Mon Jun 01, 2015 7:04 pm

jolson wrote:I'm glad to hear that you and Andy got this worked out. Is there anything further I can help you with here?

No. Please close this ticket. Thanks

Nagios Support Forum

Nagios Scheduling Issue

Nagios Scheduling Issue

Re: Nagios Scheduling Issue

Re: Nagios Scheduling Issue

Re: Nagios Scheduling Issue

Re: Nagios Scheduling Issue

Re: Nagios Scheduling Issue

Re: Nagios Scheduling Issue