Re: [Nagios-devel] High latencies problem.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] High latencies problem.

Post by Guest »



On 2/17/2009 3:15 PM, D. Emmanuel Feinsmith wrote:
> Dear Alessandro,
>
> You are more than likely eating up the cpu and memory with the
> memcpy's executed by each fork of your check_nrpe and check_icmp
> services. You can prove this out to yourself by using top to observe
> the behaviour of the nagios processes. I would also suggest that there
> is nothing else eating up CPU and memory on your nagios server box and
> keep the box dedicated. Running top will show if there is resource
> contention on your monitoring server. Keep in mind that check_nrpe is
> amongst the slowest possible commands nagios can execute because it
> has to wait for whatever timeout period you entered in your client
> nrpe.cfg for the nrpe daemon to respond. This can take seconds in some
> cases. A much more scalable solution is to enable passive checks
> (using nsca/send_nsca) on some or all of your clients)
>
> I would suggest the following things (from the nagios performance
> tuning guide):
>
> # *Check service latencies* to determine best value for maximum
> concurrent checks. Nagios can restrict the number of maximum
> concurrently executing service checks to the value you specify with
> the max_concurrent_checks option. This is good because it gives you
> some control over how much load Nagios will impose on your monitoring
> host, but it can also slow things down. If you are seeing high latency
> values (> 10 or 15 seconds) for the majority of your service checks
> (via the extinfo CGI), you are probably starving Nagios of the checks
> it needs. That's not Nagios's fault - its yours. Under ideal
> conditions, all service checks would have a latency of 0, meaning they
> were executed at the exact time that they were scheduled to be
> executed. However, it is normal for some checks to have small latency
> values. I would recommend taking the minimum number of maximum
> concurrent checks reported when running Nagios with the -s command
> line argument and doubling it. Keep increasing it until the average
> check latency for your services is fairly low.
>
> # *Optimize host check commands*. If you're checking host states using
> the check_ping plugin you'll find that host checks will be performed
> much faster if you break up the checks. Instead of specifying a
> max_attempts value of 1 in the host definition and having the
> check_ping plugin send 10 ICMP packets to the host, it would be much
> faster to set the max_attempts value to 10 and only send out 1 ICMP
> packet each time. This is due to the fact that Nagios can often
> determine the status of a host after executing the plugin once, so you
> want to make the first check as fast as possible. This method does
> have its pitfalls in some situations (i.e. hosts that are slow to
> respond may be assumed to be down), but you'll see faster host checks
> if you use it. Another option would be to use a faster plugin (i.e.
> check_fping) as the host_check_command instead of check_ping.
>
> # *Schedule regular host checks.* Scheduling regular checks of hosts
> can actually help performance in Nagios. This is due to the way the
> cached check logic works (see below). Prior to Nagios 3, regularly
> scheduled host checks used to result in a big performance hit. This is
> no longer the case, as host checks are run in parallel - just like
> service checks. To schedule regular checks of a host, set the
> check_interval directive in the host definition to something greater
> than 0.
>
> # *Enable cached host checks*. Beginning in Nagios 3, on-demand host
> checks can benefit from caching. On-demand host checks are performed
> whenever Nagios detects a service state change. These on-demand checks
> are executed because Nagios wants to know if the host associated with
> the service changed state. By enabling cached host checks, you can
> optimize performance. In some cases, Nagios may be able to used the
> old/cached state of the host, rather than actually executing a host
> check command. This can speed things up and reduce load on monitoring
> server. In order for cached checks to be effective, you

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked