Page 1 of 2

Service run delay/latency problem

Posted: Wed Sep 12, 2012 8:29 am
by mroter
We are testing dir sync package by placing a file every 1m on a source dir and use a Nagios service to delete files from the target dir. If the number of deleted files is zero the service fails.
We run the service every 3m with 3 retries @ 3m intervals.
It works fine for a while but sometimes it goes out of sync. 2 insistence of the service runs in parallel or very close to each other causing 1 to schussed (delete files) and the other to fail (0 files deleted). When that happens it may continue like this for hours - until we restart Nagios.

Any ideas?

Re: Service run delay/latency problem

Posted: Wed Sep 12, 2012 9:32 am
by lmiltchev
Next time this happens, run the following command in terminal and show us the output:

Code: Select all

ps -ef | grep /bin/nagios

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 1:22 am
by mroter
Here you go

Code: Select all

[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios    5064     1  1 05:49 ?        00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   26366  5064  0 06:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     26392 22419  0 06:18 pts/0    00:00:00 grep /bin/nagios

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 1:27 am
by mroter
Now after restarting the nagiosxi and nagios services I get only one instance
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 6778 1 0 06:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 7233 22419 0 06:24 pts/0 00:00:00 grep /bin/nagios

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 7:59 am
by scottwilkerson
Can you post a copy of your nagios.cfg

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 11:52 am
by mroter
Please see attached our NJ server nagios.cfg file

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 1:10 pm
by scottwilkerson
Can you test to see if you get the same results if you disable the livestatus broker module.

I had a feeling that this may be what the problem was.

Re: Service run delay/latency problem

Posted: Tue Sep 18, 2012 1:25 pm
by mroter
I will but we have the same configuration in UK with no problems.
Can it be related to check latency?
Can livestatus effect check scheduling?

Re: Service run delay/latency problem

Posted: Mon Sep 24, 2012 11:59 am
by mroter
I've modified the service configuration so that it will run every 5 minutes and the problem disappeared!
I suspect it has to do with the high service check latency we have only on this server, see below:
NJ 74 hosts 655 services - min 58s max 346s avg 108s
UK 113 hosts 956 services - min 3.4s max 73s avg 11.4

So you can see that UK is busier (~same application) but there is a huge gap in the service latency, how can it be?

Re: Service run delay/latency problem

Posted: Mon Sep 24, 2012 1:46 pm
by scottwilkerson
What kinds of checks do you primarily do at each location (what plugins do you use most)?