High Service Check Latency

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

High Service Check Latency

Post by petronagios »

Hi we are experiencing high Service Check Latency on one of our distributed nagios core servers. The nagios configuration is quite small, see tactical overview output below, but before I restarted nagios the service check latency figures were in the 1000’s !

Service Check Execution Time: 0.01 / 6.52 / 1.295 sec
Service Check Latency: 0.62 / 223.07 / 118.967 sec
Host Check Execution Time: 4.00 / 4.22 / 4.080 sec
Host Check Latency: 0.01 / 306.80 / 135.256 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0

The server is a small VM with 4cpus and 4GB of memory and the load average is consistently about zero, see top snapshot.

top - 20:19:44 up 87 days, 1:34, 2 users, load average: 0.32, 0.14, 0.05
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043664k total, 1353964k used, 2689700k free, 184248k buffers
Swap: 2097144k total, 96k used, 2097048k free, 587180k cached

I know I could try altering the following parameters

Max_concurrent_checks=0
max_check_result_reaper_time=30
check_result_reaper_frequency=10

but is that required for such a small number of hosts/services? Is there something fundamental I've missed in the basic configuration?

Thanks
Steve.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Service Check Latency

Post by abrist »

What type of checks are you running? With that high of latency, I would assume you have some checks that are taking a very long time to complete (or are not even completing). Run the following command and then post the ps.txt as an attachment:

Code: Select all

ps aux > /tmp/ps.txt
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: High Service Check Latency

Post by petronagios »

Hi Here's the output from the ps -aux. I'll update about the checks later today! Thanks.
Attachments
ps.txt
(12.7 KiB) Downloaded 99 times
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: High Service Check Latency

Post by yancy »

petronagios,
What type of checks are you running? With that high of latency, I would assume you have some checks that are taking a very long time to complete (or are not even completing).
The issue as abrist points out is probably due to the type of checks you are running. For example, if it's a active check over a very low bandwidth connection, or if the active check is running some custom scripts on the other end that are taking a long time to complete, that would be an issue.

-Yancy
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: High Service Check Latency

Post by petronagios »

OK, I’ve had a look at the type of checks we are running on server. They are all active checks and 80 of the 488 are license manager checks running the following command at 2 minute intervals

lmutil lmstat -c $PORT@$HOST -f $FEATURE

After running some tests each lmutil command takes four to five seconds to complete! I compared this to the usual nagios plugins ran using nrpe (check_load, disk, mem etc) and these complete in less than a second.

Do you think that’s what could be causing the high Service Check Latency?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: High Service Check Latency

Post by abrist »

Those could definitely cause more latency, if you change their interval to 5 minutes, does the latency decrease?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
petronagios
Posts: 28
Joined: Tue Aug 16, 2011 8:02 am

Re: High Service Check Latency

Post by petronagios »

Thanks abrist and yancy for your replies. I changed the license manager checks to 5mins instead of 2 and the actual Service Check Latency has reduced

Service Check Execution Time: 0.01 / 8.41 / 1.338 sec
Service Check Latency: 2.27 / 237.61 / 104.679 sec
Host Check Execution Time: 4.00 / 4.21 / 4.079 sec
Host Check Latency: 0.00 / 366.47 / 192.863 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0

I didn’t realise these checks were taking so long to complete, I’ll see if all the license feature checks are required maybe I can reduce the amount or stagger the frequency to help improve performance.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: High Service Check Latency

Post by lmiltchev »

Sounds good. Let us know if you have any more issues.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked