Receiving hundreds of "CHECK_NRPE: Socket timeout after 60 s

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
mrussi
Posts: 6
Joined: Thu Sep 07, 2017 2:24 pm

Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after

Post by mrussi »

I tested single-quoting the argument list, and unfortunately, there was no change. Though, I'm not surprised as the other Nagios host (BRAVO) has the same checks running without issue. :(
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after

Post by dwhitfield »

mrussi wrote: BRAVO, with the exact same physical server
I'm assuming so based on the OS below in the post where you say the above, but can you clarify that the OS install is also the same? You asked specifically if there was an OS setting, so that's why I'm going there.

Also...that's an absolute ton of checks. I am not at all surprised that you have issues with that many *active* checks. Have you considered just having two servers at the ALPHA location?
mrussi
Posts: 6
Joined: Thu Sep 07, 2017 2:24 pm

Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after

Post by mrussi »

dwhitfield wrote:I'm assuming so based on the OS below in the post where you say the above, but can you clarify that the OS install is also the same? You asked specifically if there was an OS setting, so that's why I'm going there.

Also...that's an absolute ton of checks. I am not at all surprised that you have issues with that many *active* checks. Have you considered just having two servers at the ALPHA location?
Hey @dwhitfield, yes, they're the exact same OS installation.

Ha! You are not alone in thinking that. The majority of these checks have been around for 5+ years which I inherited.

We have considered splitting the services up between multiple servers recently, but it introduces a significant amount of complexity to the way we handle auto-discovery of our nodes/services. We wanted to first see if there was some tweaking that we could perform on the Nagios side to help it vertically scale up since the HW is there.

For the interim, we're going to revisit some of the older checks and see if we can remove any of them. We may have to consider alternative options though.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after

Post by dwhitfield »

mrussi wrote:We may have to consider alternative options though.
One would be to just not check them as often. I know there's not always wiggle room for that in organizations, but if there is.
Locked