Hi we are experiencing high Service Check Latency on one of our distributed nagios core servers. The nagios configuration is quite small, see tactical overview output below, but before I restarted nagios the service check latency figures were in the 1000’s !
Service Check Execution Time: 0.01 / 6.52 / 1.295 sec
Service Check Latency: 0.62 / 223.07 / 118.967 sec
Host Check Execution Time: 4.00 / 4.22 / 4.080 sec
Host Check Latency: 0.01 / 306.80 / 135.256 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0
The server is a small VM with 4cpus and 4GB of memory and the load average is consistently about zero, see top snapshot.
top - 20:19:44 up 87 days, 1:34, 2 users, load average: 0.32, 0.14, 0.05
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043664k total, 1353964k used, 2689700k free, 184248k buffers
Swap: 2097144k total, 96k used, 2097048k free, 587180k cached
I know I could try altering the following parameters
Max_concurrent_checks=0
max_check_result_reaper_time=30
check_result_reaper_frequency=10
but is that required for such a small number of hosts/services? Is there something fundamental I've missed in the basic configuration?
Thanks
Steve.
High Service Check Latency
Re: High Service Check Latency
What type of checks are you running? With that high of latency, I would assume you have some checks that are taking a very long time to complete (or are not even completing). Run the following command and then post the ps.txt as an attachment:
Code: Select all
ps aux > /tmp/ps.txt
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 28
- Joined: Tue Aug 16, 2011 8:02 am
Re: High Service Check Latency
Hi Here's the output from the ps -aux. I'll update about the checks later today! Thanks.
- Attachments
-
- ps.txt
- (12.7 KiB) Downloaded 99 times
Re: High Service Check Latency
petronagios,
-Yancy
The issue as abrist points out is probably due to the type of checks you are running. For example, if it's a active check over a very low bandwidth connection, or if the active check is running some custom scripts on the other end that are taking a long time to complete, that would be an issue.What type of checks are you running? With that high of latency, I would assume you have some checks that are taking a very long time to complete (or are not even completing).
-Yancy
-
- Posts: 28
- Joined: Tue Aug 16, 2011 8:02 am
Re: High Service Check Latency
OK, I’ve had a look at the type of checks we are running on server. They are all active checks and 80 of the 488 are license manager checks running the following command at 2 minute intervals
lmutil lmstat -c $PORT@$HOST -f $FEATURE
After running some tests each lmutil command takes four to five seconds to complete! I compared this to the usual nagios plugins ran using nrpe (check_load, disk, mem etc) and these complete in less than a second.
Do you think that’s what could be causing the high Service Check Latency?
lmutil lmstat -c $PORT@$HOST -f $FEATURE
After running some tests each lmutil command takes four to five seconds to complete! I compared this to the usual nagios plugins ran using nrpe (check_load, disk, mem etc) and these complete in less than a second.
Do you think that’s what could be causing the high Service Check Latency?
Re: High Service Check Latency
Those could definitely cause more latency, if you change their interval to 5 minutes, does the latency decrease?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 28
- Joined: Tue Aug 16, 2011 8:02 am
Re: High Service Check Latency
Thanks abrist and yancy for your replies. I changed the license manager checks to 5mins instead of 2 and the actual Service Check Latency has reduced
Service Check Execution Time: 0.01 / 8.41 / 1.338 sec
Service Check Latency: 2.27 / 237.61 / 104.679 sec
Host Check Execution Time: 4.00 / 4.21 / 4.079 sec
Host Check Latency: 0.00 / 366.47 / 192.863 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0
I didn’t realise these checks were taking so long to complete, I’ll see if all the license feature checks are required maybe I can reduce the amount or stagger the frequency to help improve performance.
Service Check Execution Time: 0.01 / 8.41 / 1.338 sec
Service Check Latency: 2.27 / 237.61 / 104.679 sec
Host Check Execution Time: 4.00 / 4.21 / 4.079 sec
Host Check Latency: 0.00 / 366.47 / 192.863 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0
I didn’t realise these checks were taking so long to complete, I’ll see if all the license feature checks are required maybe I can reduce the amount or stagger the frequency to help improve performance.
Re: High Service Check Latency
Sounds good. Let us know if you have any more issues.
Be sure to check out our Knowledgebase for helpful articles and solutions!