Nagios scheduling queue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Nagios scheduling queue

Post by klajosh2 »

Hi,

I am using the following setup:

I am using Nagios 3.5.1 and latest mod_gearman.

on Redhat 6.5: gearmand, mod_gearman_neb + nagios + mod_gearman worker
on Debian 7.8: mod_gearman worker

There is a check which should run quite frequently, in every 5 mins. This check checks the network devices' interfaces. I am using check_multi
to collect all the interface checks / device. What I noticed when I checked the Scheduling Queue of Nagios is following (an example):
the interface check ran at 12:19. The next check will run at 12:44? Is this related to mod_gearman or is this a nagios bug or configuration bug?
I noticed the problem when I was checking the age or the rrd files. and they did not get uptated more the 40 mins. (which is not good)

I attach the following 2 pictures as an example:

int.jpg:
this the snippet from the scheduling queue. This check should happen in every 5 mins. Why did nagios schedule it 25 mins away?
what can be the problem?

rrd-chk.jpg:
this check checks how often updates certain rrd files.

Can anybody help?

Thank you in advance,

klajosh
Attachments
this check checks how often updates certain rrd files.
this check checks how often updates certain rrd files.
this the snippet from the scheduling queue. This check should happen in every 5 mins. Why did nagios schedule it 25 mins away?
this the snippet from the scheduling queue. This check should happen in every 5 mins. Why did nagios schedule it 25 mins away?
int.jpg (4.65 KiB) Viewed 4249 times
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Nagios scheduling queue

Post by jdalrymple »

The first thing I'd take a look at is the clock skew across the gearman workers and the Nagios box. I've seen where results come back from a gearman worker whose clock is askew and that impacts the next run time for a host/service check.
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Re: Nagios scheduling queue

Post by klajosh2 »

this can be a good idea, but the thing is that I have to monitor devices in different geographical locations. I solved this with one main nagios server
with different mod_gearman collectors. Those collectors not just poll the devices but do other things.. (like internal webserver). So in short: some
of the collectors are in different time zones and time zone setting cannot be changed on those machines.

on the another hand this problem happens on collectors in the same time zone where the main nagios server is and the time settings are the same.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios scheduling queue

Post by tgriep »

Could you post your mod gearman worker and server config files and the service check that is having problems?
Be sure to check out our Knowledgebase for helpful articles and solutions!
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Re: Nagios scheduling queue

Post by klajosh2 »

(sorry for late answer I have quite busy days nowadays)
It turned out, I cannot narrow down the problem for a specific service. There are services what are randomly abandoned by the nagios scheduler.
(instead of checking them every 3 mins the offset between 2 checks is 30 mins)
what I am thinking that the root of the problem can be that I have too many checks too often, and nagios core with service_inter_check_delay_method=s
cannot handle that. I mean nagios core sees the whole monitoring environment as a one server environment but currently I have 7 pollers/collectors (call whater you want) in 5 different
locations doing checks with one main server, and nagios core wants to protect this server regarding load.
This is just an idea, I do not know if it is true or not.
So I think that the problem is not in mod_gearman but in nagios core (3.5.1) which does not schedule the checks properly.
What do you think?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios scheduling queue

Post by jolson »

If you view the extended details of one of your affected services, what do you see? Please post a screenshot similar to the following:
2015-04-24 12_15_26-Nagios Core.png
I am interested in the check latency/duration - perhaps the check is taking a long time to execute? It's also possible that the latency is high.

Is there any excessive load on the server - are the resources being starved?

Code: Select all

top
free -m
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Re: Nagios scheduling queue

Post by klajosh2 »

Hi,

the machine is definitely overloaded:

Code: Select all

# w
 15:34:17 up 25 days,  1:48,  1 user,  load average: 4.01, 3.48, 3.22
it has 4 cpu.
Please check the attachments.

klajosh
Attachments
$A955695357B5A07.jpg
$1CF00C842961790D.jpg
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Re: Nagios scheduling queue

Post by klajosh2 »

same service:

Code: Select all

Check Latency / Duration: 	1.823 /34.001 seconds 
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Nagios scheduling queue

Post by jolson »

It's possible your issues are being caused by the performance of your box.
Upgrade to Nagios 4.x - Nagios 4.x is much faster than Nagios 3.x - this is mostly due to the introduction of 'Core Workers'. You can read more about the enhancements of 4 here: http://labs.nagios.com/2013/09/20/nagio ... available/
Some other performance tweaks: http://nagios.sourceforge.net/docs/nagi ... uning.html

While I do think that service_inter_check_delay_method=s could have something to do with the issues described here, I think that upgrading to 4.x has the possibility of helping you out the most.

As for why the Nagios server is scheduling the checks so far out, could you please post a service configuration of one of the affected services?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
klajosh2
Posts: 38
Joined: Thu Jan 16, 2014 5:22 am

Re: Nagios scheduling queue

Post by klajosh2 »

there are few things which hold me back to upgrading to nagios 4.0.8.

- PNP4Nagios Broker Module npcdmod.o is not compatible with Nagios Core 4.x
and
- Mod-Gearman works best since version 3.2.2 up to the latest stable Nagios 3.5.1. Nagios 4 is not fully tested yet,

and my environment heavily uses these 2 broker module.

So I think my hands are tied here :(

I attach a graph with the visualizes host/service latency based on nagiostats.
Attachments
latency.jpg
Locked