Re: [Nagios-devel] RFC Proof of concenpt patch: Restarting embedded

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] RFC Proof of concenpt patch: Restarting embedded

Post by Guest »

Stanley Hopcroft wrote:
> Dear Ladies and Gentlemen,
>
> Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum
> memory used by the embedded Perl Nag (ePN) process by periodically
> deallocating the Perl interpreter and re-initialising it.
>
> Since 1.2 is my Nag test bed, these changes were backported to it and
> the negative results noted in a former letter.
>
> However, changes to the reinit mechanism used by 2.x appear to deal with
> the problem of increasing memory usage by an ePN by _restarting_ Nagios
> periodically.
>
> The changes are
>
> 1 In utils.c/reinit_embedded_perl(void)
>
> fork, and in the child process exec the Nag startup script with the the
> 'restart' parameter.
>

About the ugliest solution I've heard of so far. How does it handle
flushing initial status data to logs? Will this cause logfiles to grow
at an alarming rate instead? I think you need to rethink this. Also,
leaving the dirty work of cleaning up a process' memory space to the
kernel is generally (and rightly so) considered bad practice. This
routine takes bad practice to the next level.

> int reinit_embedded_perl(void){
>
> #ifdef EMBEDDEDPERL
> char buffer[MAX_INPUT_BUFFER];
> pid_t pid ;
>
> snprintf(buffer,sizeof(buffer),"Restarting Nagios (to
> re-initialize embedded Perl interpreter) after %d uses
> ...\n",embedded_perl_calls);
> buffer[sizeof(buffer)-1]='\x0';
> write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);
>
> pid=fork();
>
> if(pid==-1)
> exit(STATE_UNKNOWN) ;
>
> else if(pid==0){
>
> execlp("/usr/local/etc/rc.d/nagios.sh",
> "/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;
>
> } else {
>
> exit(STATE_OK) ;
> }
> #endif
> return OK ;
>
> }
>
>
> 2 Make the Nag startup script suid root.
>

Dangerous, since it doesn't do a lot of checking to ensure it doesn't
clobber anything. A user gaining write access to the Nagios binary would
be in for a walk in the park to escalate his privileges.

> 2.1 minor changes to the startup script (to remove the su) and have the
> startup script append debug output to a file.
>
> As with the 2.x code, reinit_embedded_perl() is called in checks.c
> whenever the number of calls to the embedded interpreter exceeds a
> threshold value.
>
> It may well be that the restart is better done by the daemon process,
> rather than in a child forked to perform a service check. (This way
> seemed to me to be the fastest way to proceed [since there was already
> 2.x code with this structure)].
>
> Here is an extract from the Nagios log showing some test results
>
> [1095429760] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095429760] Caught SIGTERM, shutting down...
> [1095429760] Nagios 1.2 starting... (PID=83831)
> [1095429760] Successfully shutdown... (PID=81306)
> [1095429760] Finished daemonizing... (New PID=83832)
>
> [1095430344] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095430344] Caught SIGTERM, shutting down...
> [1095430344] Successfully shutdown... (PID=83832)
> [1095430344] Nagios 1.2 starting... (PID=86358)
> [1095430344] Finished daemonizing... (New PID=86359)
>

So... that's 10 hours between restarts. How many service checks is this
for? What happens in networks large enough for the idea of the EPN to be
really useful (+5000 services)?

> I am now testing my prod Nag with this change and a threshold of 100_000
> checks (should be about a week or a mem usage of 40-60 MB).
>

40-60 MB * 50 concurrent Nagios processes. I don't have that much RAM.

Is dropping the EPN completely out of the question? It seems to me like
it's been given a lot of work and has only gone from bad to worse, and I
for one won't enable it with the requirements you just mentioned.

> Yours sincerely.
>

--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked