Page 1 of 2
Service run delay/latency problem
Posted: Wed Sep 12, 2012 8:29 am
by mroter
We are testing dir sync package by placing a file every 1m on a source dir and use a Nagios service to delete files from the target dir. If the number of deleted files is zero the service fails.
We run the service every 3m with 3 retries @ 3m intervals.
It works fine for a while but sometimes it goes out of sync. 2 insistence of the service runs in parallel or very close to each other causing 1 to schussed (delete files) and the other to fail (0 files deleted). When that happens it may continue like this for hours - until we restart Nagios.
Any ideas?
Re: Service run delay/latency problem
Posted: Wed Sep 12, 2012 9:32 am
by lmiltchev
Next time this happens, run the following command in terminal and show us the output:
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 1:22 am
by mroter
Here you go
Code: Select all
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 5064 1 1 05:49 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26366 5064 0 06:18 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 26392 22419 0 06:18 pts/0 00:00:00 grep /bin/nagios
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 1:27 am
by mroter
Now after restarting the nagiosxi and nagios services I get only one instance
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 6778 1 0 06:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 7233 22419 0 06:24 pts/0 00:00:00 grep /bin/nagios
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 7:59 am
by scottwilkerson
Can you post a copy of your nagios.cfg
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 11:52 am
by mroter
Please see attached our NJ server nagios.cfg file
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 1:10 pm
by scottwilkerson
Can you test to see if you get the same results if you disable the livestatus broker module.
I had a feeling that this may be what the problem was.
Re: Service run delay/latency problem
Posted: Tue Sep 18, 2012 1:25 pm
by mroter
I will but we have the same configuration in UK with no problems.
Can it be related to check latency?
Can livestatus effect check scheduling?
Re: Service run delay/latency problem
Posted: Mon Sep 24, 2012 11:59 am
by mroter
I've modified the service configuration so that it will run every 5 minutes and the problem disappeared!
I suspect it has to do with the high service check latency we have only on this server, see below:
NJ 74 hosts 655 services - min 58s max 346s avg 108s
UK 113 hosts 956 services - min 3.4s max 73s avg 11.4
So you can see that UK is busier (~same application) but there is a huge gap in the service latency, how can it be?
Re: Service run delay/latency problem
Posted: Mon Sep 24, 2012 1:46 pm
by scottwilkerson
What kinds of checks do you primarily do at each location (what plugins do you use most)?