Service run delay/latency problem

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Service run delay/latency problem

Post by mroter »

We are testing dir sync package by placing a file every 1m on a source dir and use a Nagios service to delete files from the target dir. If the number of deleted files is zero the service fails.
We run the service every 3m with 3 retries @ 3m intervals.
It works fine for a while but sometimes it goes out of sync. 2 insistence of the service runs in parallel or very close to each other causing 1 to schussed (delete files) and the other to fail (0 files deleted). When that happens it may continue like this for hours - until we restart Nagios.

Any ideas?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Service run delay/latency problem

Post by lmiltchev »

Next time this happens, run the following command in terminal and show us the output:

Code: Select all

ps -ef | grep /bin/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Re: Service run delay/latency problem

Post by mroter »

Here you go

Code: Select all

[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios    5064     1  1 05:49 ?        00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26366  5064  0 06:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     26392 22419  0 06:18 pts/0    00:00:00 grep /bin/nagios
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Re: Service run delay/latency problem

Post by mroter »

Now after restarting the nagiosxi and nagios services I get only one instance
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 6778 1 0 06:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 7233 22419 0 06:24 pts/0 00:00:00 grep /bin/nagios
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service run delay/latency problem

Post by scottwilkerson »

Can you post a copy of your nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Re: Service run delay/latency problem

Post by mroter »

Please see attached our NJ server nagios.cfg file
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service run delay/latency problem

Post by scottwilkerson »

Can you test to see if you get the same results if you disable the livestatus broker module.

I had a feeling that this may be what the problem was.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Re: Service run delay/latency problem

Post by mroter »

I will but we have the same configuration in UK with no problems.
Can it be related to check latency?
Can livestatus effect check scheduling?
mroter
Posts: 80
Joined: Sun Apr 29, 2012 12:43 pm

Re: Service run delay/latency problem

Post by mroter »

I've modified the service configuration so that it will run every 5 minutes and the problem disappeared!
I suspect it has to do with the high service check latency we have only on this server, see below:
NJ 74 hosts 655 services - min 58s max 346s avg 108s
UK 113 hosts 956 services - min 3.4s max 73s avg 11.4

So you can see that UK is busier (~same application) but there is a huge gap in the service latency, how can it be?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service run delay/latency problem

Post by scottwilkerson »

What kinds of checks do you primarily do at each location (what plugins do you use most)?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked