execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Post Reply
sherm4n
Posts: 8
Joined: Sun Mar 12, 2017 7:36 pm

execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Post by sherm4n »

Hi
Recently we have encountered an issue with nagios, it failed to send notification from "intermittently". After we going through the nagios.log, we found the following errors
[1699879778] wproc: NOTIFY job 92385 from worker Core Worker 5369 is a non-check helper but exited with return code 7
[1699879778] wproc: host=sv960-lbp5.eq; service=Keepalived State; contact=opsgenie_xxx_team
[1699879778] wproc: early_timeout=0; exited_ok=1; wait_status=1792; error_code=0;
[1699879778] wproc: stderr line 01: execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

However, nagios.debug, indicated the argument was 208 bytes, which was well under the ARG_MAX limit, after we deep dive into the source code, lib/runcmd.c, we added explain_execvp () and explain_message_execvp() after the execvp calls, we found the details of the error message
failed, Argument list too long (7, E2BIG) because the total number of bytes in the argument list (argv) plus the environment (envp) is too large (143372 > 5242880)

it indicated the size of arguments plus all the environment variables, was about ~140k, which was much larger than we expected, as we never see all those environment variables on any notification we have.

Further study on execvp(), it indicated execvp() was limited to 128k, MAX_ARG_STRLEN, which explained why it failed with ~140k argument length, however, again , at that moment, we have no idea where those extra bytes were coming from.

After further investigated the environment variables populated by nagios, we found a lot of environment variables populated by nagios during notification were not used, and one of them NAGIOS_SERVICEGROUPMEMBERS was took out 44k+. After we added few codes on runcmd_setenv() to filter out NAGIOS_SERVICEGROUPMEMBERS, the nagios is happy to send out any notification, the error on about was no longer found.

I understand a lot of nagios users out there from a company a lot bigger than us, a lot of more devices monitored by nagios, I would like to know if anyone has encountered the same this problem we have, and how they addressed the problem, instead of hacking the code, any better solution available? Most important how nagios address execvp() limitation.

Thanks in advance.

Sherman
sherm4n
Posts: 8
Joined: Sun Mar 12, 2017 7:36 pm

Re: execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Post by sherm4n »

Hi kg2857
We did, and we aware there was 128k limit due to MAX_ARG_STRLEN, PageSize x 32, and that was hard limit, and can not change unless update the header file, and recompile the kernel. Most of time 128k is sufficient enough, however, the environment variable, NAGIOS_SERVICEGROUPMEMBERS , took up a huge amount of environment space, that contains not so related information, at least not useful for notification, and caused execvp() failed. you may try to dump it out by setup a notification command with a bash scripts

#!/bin/sh -x
exec 2>&1 1>>/tmp/debug.out
echo "$@"
set
exit 0

any other thought how to get around the problem ? Nagios Developer ?
kg2857
Posts: 237
Joined: Wed Apr 12, 2023 5:48 pm

Re: execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Post by kg2857 »

There was a similar recent post that wound up being a service that was doing a find on a dir that contained a massive number of files. 100k is a lot for an argument. I think I'd be looking for something similar.
sherm4n
Posts: 8
Joined: Sun Mar 12, 2017 7:36 pm

Re: execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Post by sherm4n »

Push
Anyone able to shed some light?
kg2857
Posts: 237
Joined: Wed Apr 12, 2023 5:48 pm

Re: execvp(/bin/sh, ...) failed. errno is 7: Argument list too long

Post by kg2857 »

I think you'll need to figure out what service is creating the issue and why the arg list is so long. The error is from the shell, suggesting the service command runs a shell script, and isn't really a nagios issue.
Post Reply