NRPE master high load issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
snapier3
Posts: 62
Joined: Tue Apr 23, 2019 7:12 pm

Re: NRPE master high load issue

Post by snapier3 »

What check is running every few minutes that's spiking the system?

Some service checks require more resources than others due to the plugin/nature of what's being checked.

Also I see your up 100+ and not constantly pegged on the CPU so there's progress in your tuning :)
ahiya
Posts: 11
Joined: Wed Nov 20, 2019 1:04 am

Re: NRPE master high load issue

Post by ahiya »

snapier3 wrote:What check is running every few minutes that's spiking the system?

Some service checks require more resources than others due to the plugin/nature of what's being checked.

Also I see your up 100+ and not constantly pegged on the CPU so there's progress in your tuning :)

hi

thank you for responding.

the checks that are running are mostly NRPE (fping) that supposed to be sent to the NRPE remote agent.
my server have 4 cpu ,16 gig ram and 100gig ssd hd.
yes, i did some tunning but it doesn't work as expected.

Does anyone here has experience with mod_gearman plugin?
snapier3
Posts: 62
Joined: Tue Apr 23, 2019 7:12 pm

Re: NRPE master high load issue

Post by snapier3 »

Can you help me better understand the overall goal of what you're trying to accomplish?


Situation 1:
Nagios server (origin) executes the fping command against the target host (destination). The telemetry gathered via this check is evaluated to determine the target host (destination) is either up/down or within defined latency limits via Nagios.

Situation 2:
Nagios sends a command to the target host(where your agent is installed) to execute an fping to single/multiple external host via NRPE. This host (origin) then executes the fping to the external hosts (the destination) and then returns the telemetry collected (the ping times) to the Nagios server. The telemetry gathered is then evaluated to determine availability and performance of network connectivity from the orgin to destination.

--SN
ahiya
Posts: 11
Joined: Wed Nov 20, 2019 1:04 am

Re: NRPE master high load issue

Post by ahiya »

hi

option two is what I'm doing.

now i realized that basically I've didn't really offload tasks from the master rather just replaced the tasks from many fping checks to many nrpe checks.

what I'm trying to do is to offload the tasks to remote hosts, but I don't really know the right way to do soo.


thanks
snapier3
Posts: 62
Joined: Tue Apr 23, 2019 7:12 pm

Re: NRPE master high load issue

Post by snapier3 »

In reading through the information for check_fping it looks like some of the possible tuning options for fping(current) are not present.

https://www.monitoring-plugins.org/doc/ ... fping.html

https://github.com/schweikert/fping/blo ... /fping.pod


-T in check_ping is actually referring to "-p" in fping which is a little confusing

Code: Select all

-p, --period=MSEC
In looping or counting modes (-l, -c, or -C), this parameter sets the time in milliseconds that fping waits between successive packets to an individual target. Default is 1000 and minimum is 10.
The interval time out is not an option in check_fping

Code: Select all

-i, --interval=MSEC
The minimum amount of time (in milliseconds) between sending a ping packet to any target (default is 10, minimum is 1).
The timeout interval is not an option with check fping

Code: Select all

-t, --timeout=MSEC

Initial target timeout in milliseconds. In the default, non-loop mode, the default timeout is 500ms, and it represents the amount of time that fping waits for a response to its first request. Successive timeouts are multiplied by the backoff factor specified with -B.

In loop/count mode, the default timeout is automatically adjusted to match the "period" value (but not more than 2000ms). You can still adjust the timeout value with this option, if you wish to, but note that setting a value larger than "period" produces inconsistent results, because the timeout value can be respected only for the last ping.

Also note that any received replies that are larger than the timeout value, will be discarded
Your task will be to create a library/script to efficiently ping a list of servers from a remote host executed via NRPE or possibly using the batch execute options (NagiosXI) is available.

--SN
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: NRPE master high load issue

Post by gormank »

I think that doing pings from the Nagios host (if it works for the OP) will be way more efficient than using nrpe.
From what I've read about check_fping and check_icmp, both claim to be able to ping multiple hosts from a single check. (I'm actually getting set to do this myself via nrpe, since I've been asked to check from various remote hosts to other remote hosts.)
As others have mentioned, doing these checks multiple times a minute seems like it is/will be an issue.
Locked