Service run delay/latency problem
Service run delay/latency problem
We are testing dir sync package by placing a file every 1m on a source dir and use a Nagios service to delete files from the target dir. If the number of deleted files is zero the service fails.
We run the service every 3m with 3 retries @ 3m intervals.
It works fine for a while but sometimes it goes out of sync. 2 insistence of the service runs in parallel or very close to each other causing 1 to schussed (delete files) and the other to fail (0 files deleted). When that happens it may continue like this for hours - until we restart Nagios.
Any ideas?
We run the service every 3m with 3 retries @ 3m intervals.
It works fine for a while but sometimes it goes out of sync. 2 insistence of the service runs in parallel or very close to each other causing 1 to schussed (delete files) and the other to fail (0 files deleted). When that happens it may continue like this for hours - until we restart Nagios.
Any ideas?
Re: Service run delay/latency problem
Next time this happens, run the following command in terminal and show us the output:
Code: Select all
ps -ef | grep /bin/nagiosBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: Service run delay/latency problem
Here you go
Code: Select all
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 5064 1 1 05:49 ? 00:00:18 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26366 5064 0 06:18 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 26392 22419 0 06:18 pts/0 00:00:00 grep /bin/nagios
Re: Service run delay/latency problem
Now after restarting the nagiosxi and nagios services I get only one instance
[root@nagiosxi1-NJ ~]# ps -ef | grep /bin/nagios
nagios 6778 1 0 06:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 7233 22419 0 06:24 pts/0 00:00:00 grep /bin/nagios
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service run delay/latency problem
Can you post a copy of your nagios.cfg
Re: Service run delay/latency problem
Please see attached our NJ server nagios.cfg file
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service run delay/latency problem
Can you test to see if you get the same results if you disable the livestatus broker module.
I had a feeling that this may be what the problem was.
I had a feeling that this may be what the problem was.
Re: Service run delay/latency problem
I will but we have the same configuration in UK with no problems.
Can it be related to check latency?
Can livestatus effect check scheduling?
Can it be related to check latency?
Can livestatus effect check scheduling?
Re: Service run delay/latency problem
I've modified the service configuration so that it will run every 5 minutes and the problem disappeared!
I suspect it has to do with the high service check latency we have only on this server, see below:
So you can see that UK is busier (~same application) but there is a huge gap in the service latency, how can it be?
I suspect it has to do with the high service check latency we have only on this server, see below:
NJ 74 hosts 655 services - min 58s max 346s avg 108s
UK 113 hosts 956 services - min 3.4s max 73s avg 11.4
So you can see that UK is busier (~same application) but there is a huge gap in the service latency, how can it be?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service run delay/latency problem
What kinds of checks do you primarily do at each location (what plugins do you use most)?