Huge Service Group: all checks fail, return code 7

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
uavife
Posts: 2
Joined: Wed Oct 26, 2016 3:46 am

Huge Service Group: all checks fail, return code 7

Post by uavife »

Hi,

We have a large Service Group, couple thousands services. No problems here. When we put another "big" Service Group (100 services) inside the large one, all services for the combined group fail:

"Warning: Return code of 7 for check of service..."

The checks don't get executed, this happens for all kinds of check. I tried adding all services from group 2 into group 1 directly, adding those services directly as a member. Result: the same.

It's like the Service Group is too big to handle.

I googled a bit...I found Linux Error Codes:

#define E2BIG 7 /* Arg list too long */

But what is too long? We don't use any macros like $SERVICEGROUPMEMBERS$. I could understand that that would give problems.

I tried putting debug on 2048, verbosity to 2 but no info. This is because the check fails at the init of the environment I guess.

Is there anything I can do to debug this? Or better, to fix this? :)

Thank you

Vincent
Last edited by dwhitfield on Thu Oct 27, 2016 9:42 am, edited 1 time in total.
Reason: marking with green check mark
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Huge Service Group: all checks fail, return code 7

Post by dwhitfield »

To help the community trouble shoot this, could you tell us a bit more about the system?

Specifically:
On what distribution are you running Nagios?
What version of Nagios are you using?
Did you compile Nagios from source or did you use distribution repos?

Thanks!
uavife
Posts: 2
Joined: Wed Oct 26, 2016 3:46 am

Re: Huge Service Group: all checks fail, return code 7

Post by uavife »

OS: CentOS Linux release 7.2.1511 (Core)
Nagios, from EPEL repo: nagios-devel-4.0.8-2.el7.x86_64

I have found a fix this morning, by setting in the nagios.cfg file:

enable_environment_macros=0

We didn't use them except for one check, which we changed to accept info from the command line instead of taking it from the env. So I think this concludes that there were indeed too many and/or too large environment vars that are set. I suspect things like $SERVICEGROUPMEMBERS$ were so big that the environment failed?

I did find in the documentation that it's recommended to set enable_environment_macros=0 for big environment. However I thought that Nagios would fail more gracefully.

So I guess case closed.

Hope this might help others with the return code 7 errors in the future.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Huge Service Group: all checks fail, return code 7

Post by dwhitfield »

Fantastic. Thanks for posting the fix! I'll go ahead and close.
Locked