Re: [Nagios-devel] High latencies problem.

Guest · Post by **Guest** » Tue Feb 17, 2009 7:06 pm

On 2/17/2009 3:15 PM, D. Emmanuel Feinsmith wrote:
> Dear Alessandro,
>
> You are more than likely eating up the cpu and memory with the
> memcpy's executed by each fork of your check_nrpe and check_icmp
> services. You can prove this out to yourself by using top to observe
> the behaviour of the nagios processes. I would also suggest that there
> is nothing else eating up CPU and memory on your nagios server box and
> keep the box dedicated. Running top will show if there is resource
> contention on your monitoring server. Keep in mind that check_nrpe is
> amongst the slowest possible commands nagios can execute because it
> has to wait for whatever timeout period you entered in your client
> nrpe.cfg for the nrpe daemon to respond. This can take seconds in some
> cases. A much more scalable solution is to enable passive checks
> (using nsca/send_nsca) on some or all of your clients)
>
> I would suggest the following things (from the nagios performance
> tuning guide):
>
> # *Check service latencies* to determine best value for maximum
> concurrent checks. Nagios can restrict the number of maximum
> concurrently executing service checks to the value you specify with
> the max_concurrent_checks option. This is good because it gives you
> some control over how much load Nagios will impose on your monitoring
> host, but it can also slow things down. If you are seeing high latency
> values (> 10 or 15 seconds) for the majority of your service checks
> (via the extinfo CGI), you are probably starving Nagios of the checks
> it needs. That's not Nagios's fault - its yours. Under ideal
> conditions, all service checks would have a latency of 0, meaning they
> were executed at the exact time that they were scheduled to be
> executed. However, it is normal for some checks to have small latency
> values. I would recommend taking the minimum number of maximum
> concurrent checks reported when running Nagios with the -s command
> line argument and doubling it. Keep increasing it until the average
> check latency for your services is fairly low.
>
> # *Optimize host check commands*. If you're checking host states using
> the check_ping plugin you'll find that host checks will be performed
> much faster if you break up the checks. Instead of specifying a
> max_attempts value of 1 in the host definition and having the
> check_ping plugin send 10 ICMP packets to the host, it would be much
> faster to set the max_attempts value to 10 and only send out 1 ICMP
> packet each time. This is due to the fact that Nagios can often
> determine the status of a host after executing the plugin once, so you
> want to make the first check as fast as possible. This method does
> have its pitfalls in some situations (i.e. hosts that are slow to
> respond may be assumed to be down), but you'll see faster host checks
> if you use it. Another option would be to use a faster plugin (i.e.
> check_fping) as the host_check_command instead of check_ping.
>
> # *Schedule regular host checks.* Scheduling regular checks of hosts
> can actually help performance in Nagios. This is due to the way the
> cached check logic works (see below). Prior to Nagios 3, regularly
> scheduled host checks used to result in a big performance hit. This is
> no longer the case, as host checks are run in parallel - just like
> service checks. To schedule regular checks of a host, set the
> check_interval directive in the host definition to something greater
> than 0.
>
> # *Enable cached host checks*. Beginning in Nagios 3, on-demand host
> checks can benefit from caching. On-demand host checks are performed
> whenever Nagios detects a service state change. These on-demand checks
> are executed because Nagios wants to know if the host associated with
> the service changed state. By enabling cached host checks, you can
> optimize performance. In some cases, Nagios may be able to used the
> old/cached state of the host, rather than actually executing a host
> check command. This can speed things up and reduce load on monitoring
> server. In order for cached checks to be effective, you

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]