Page 2 of 2
Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after
Posted: Mon Sep 18, 2017 4:39 pm
by mrussi
I tested single-quoting the argument list, and unfortunately, there was no change. Though, I'm not surprised as the other Nagios host (BRAVO) has the same checks running without issue.

Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after
Posted: Mon Sep 18, 2017 8:13 pm
by dwhitfield
mrussi wrote: BRAVO, with the exact same physical server
I'm assuming so based on the OS below in the post where you say the above, but can you clarify that the OS install is also the same? You asked specifically if there was an OS setting, so that's why I'm going there.
Also...that's an absolute ton of checks. I am not at all surprised that you have issues with that many *active* checks. Have you considered just having two servers at the ALPHA location?
Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after
Posted: Tue Sep 19, 2017 4:07 pm
by mrussi
dwhitfield wrote:I'm assuming so based on the OS below in the post where you say the above, but can you clarify that the OS install is also the same? You asked specifically if there was an OS setting, so that's why I'm going there.
Also...that's an absolute ton of checks. I am not at all surprised that you have issues with that many *active* checks. Have you considered just having two servers at the ALPHA location?
Hey @dwhitfield, yes, they're the exact same OS installation.
Ha! You are not alone in thinking that. The majority of these checks have been around for 5+ years which I inherited.
We have considered splitting the services up between multiple servers recently, but it introduces a significant amount of complexity to the way we handle auto-discovery of our nodes/services. We wanted to first see if there was some tweaking that we could perform on the Nagios side to help it vertically scale up since the HW is there.
For the interim, we're going to revisit some of the older checks and see if we can remove any of them. We may have to consider alternative options though.
Re: Receiving hundreds of "CHECK_NRPE: Socket timeout after
Posted: Tue Sep 19, 2017 4:22 pm
by dwhitfield
mrussi wrote:We may have to consider alternative options though.
One would be to just not check them as often. I know there's not always wiggle room for that in organizations, but if there is.