Re: [Nagios-devel] RFC Proof of concenpt patch: Restarting embedded
Posted: Fri Sep 17, 2004 6:57 pm
Stanley Hopcroft wrote:
> Dear Ladies and Gentlemen,
>
> Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum
> memory used by the embedded Perl Nag (ePN) process by periodically
> deallocating the Perl interpreter and re-initialising it.
>
> Since 1.2 is my Nag test bed, these changes were backported to it and
> the negative results noted in a former letter.
>
> However, changes to the reinit mechanism used by 2.x appear to deal with
> the problem of increasing memory usage by an ePN by _restarting_ Nagios
> periodically.
>
> The changes are
>
> 1 In utils.c/reinit_embedded_perl(void)
>
> fork, and in the child process exec the Nag startup script with the the
> 'restart' parameter.
>
About the ugliest solution I've heard of so far. How does it handle
flushing initial status data to logs? Will this cause logfiles to grow
at an alarming rate instead? I think you need to rethink this. Also,
leaving the dirty work of cleaning up a process' memory space to the
kernel is generally (and rightly so) considered bad practice. This
routine takes bad practice to the next level.
> int reinit_embedded_perl(void){
>
> #ifdef EMBEDDEDPERL
> char buffer[MAX_INPUT_BUFFER];
> pid_t pid ;
>
> snprintf(buffer,sizeof(buffer),"Restarting Nagios (to
> re-initialize embedded Perl interpreter) after %d uses
> ...\n",embedded_perl_calls);
> buffer[sizeof(buffer)-1]='\x0';
> write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);
>
> pid=fork();
>
> if(pid==-1)
> exit(STATE_UNKNOWN) ;
>
> else if(pid==0){
>
> execlp("/usr/local/etc/rc.d/nagios.sh",
> "/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;
>
> } else {
>
> exit(STATE_OK) ;
> }
> #endif
> return OK ;
>
> }
>
>
> 2 Make the Nag startup script suid root.
>
Dangerous, since it doesn't do a lot of checking to ensure it doesn't
clobber anything. A user gaining write access to the Nagios binary would
be in for a walk in the park to escalate his privileges.
> 2.1 minor changes to the startup script (to remove the su) and have the
> startup script append debug output to a file.
>
> As with the 2.x code, reinit_embedded_perl() is called in checks.c
> whenever the number of calls to the embedded interpreter exceeds a
> threshold value.
>
> It may well be that the restart is better done by the daemon process,
> rather than in a child forked to perform a service check. (This way
> seemed to me to be the fastest way to proceed [since there was already
> 2.x code with this structure)].
>
> Here is an extract from the Nagios log showing some test results
>
> [1095429760] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095429760] Caught SIGTERM, shutting down...
> [1095429760] Nagios 1.2 starting... (PID=83831)
> [1095429760] Successfully shutdown... (PID=81306)
> [1095429760] Finished daemonizing... (New PID=83832)
>
> [1095430344] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095430344] Caught SIGTERM, shutting down...
> [1095430344] Successfully shutdown... (PID=83832)
> [1095430344] Nagios 1.2 starting... (PID=86358)
> [1095430344] Finished daemonizing... (New PID=86359)
>
So... that's 10 hours between restarts. How many service checks is this
for? What happens in networks large enough for the idea of the EPN to be
really useful (+5000 services)?
> I am now testing my prod Nag with this change and a threshold of 100_000
> checks (should be about a week or a mem usage of 40-60 MB).
>
40-60 MB * 50 concurrent Nagios processes. I don't have that much RAM.
Is dropping the EPN completely out of the question? It seems to me like
it's been given a lot of work and has only gone from bad to worse, and I
for one won't enable it with the requirements you just mentioned.
> Yours sincerely.
>
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Dear Ladies and Gentlemen,
>
> Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum
> memory used by the embedded Perl Nag (ePN) process by periodically
> deallocating the Perl interpreter and re-initialising it.
>
> Since 1.2 is my Nag test bed, these changes were backported to it and
> the negative results noted in a former letter.
>
> However, changes to the reinit mechanism used by 2.x appear to deal with
> the problem of increasing memory usage by an ePN by _restarting_ Nagios
> periodically.
>
> The changes are
>
> 1 In utils.c/reinit_embedded_perl(void)
>
> fork, and in the child process exec the Nag startup script with the the
> 'restart' parameter.
>
About the ugliest solution I've heard of so far. How does it handle
flushing initial status data to logs? Will this cause logfiles to grow
at an alarming rate instead? I think you need to rethink this. Also,
leaving the dirty work of cleaning up a process' memory space to the
kernel is generally (and rightly so) considered bad practice. This
routine takes bad practice to the next level.
> int reinit_embedded_perl(void){
>
> #ifdef EMBEDDEDPERL
> char buffer[MAX_INPUT_BUFFER];
> pid_t pid ;
>
> snprintf(buffer,sizeof(buffer),"Restarting Nagios (to
> re-initialize embedded Perl interpreter) after %d uses
> ...\n",embedded_perl_calls);
> buffer[sizeof(buffer)-1]='\x0';
> write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);
>
> pid=fork();
>
> if(pid==-1)
> exit(STATE_UNKNOWN) ;
>
> else if(pid==0){
>
> execlp("/usr/local/etc/rc.d/nagios.sh",
> "/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;
>
> } else {
>
> exit(STATE_OK) ;
> }
> #endif
> return OK ;
>
> }
>
>
> 2 Make the Nag startup script suid root.
>
Dangerous, since it doesn't do a lot of checking to ensure it doesn't
clobber anything. A user gaining write access to the Nagios binary would
be in for a walk in the park to escalate his privileges.
> 2.1 minor changes to the startup script (to remove the su) and have the
> startup script append debug output to a file.
>
> As with the 2.x code, reinit_embedded_perl() is called in checks.c
> whenever the number of calls to the embedded interpreter exceeds a
> threshold value.
>
> It may well be that the restart is better done by the daemon process,
> rather than in a child forked to perform a service check. (This way
> seemed to me to be the fastest way to proceed [since there was already
> 2.x code with this structure)].
>
> Here is an extract from the Nagios log showing some test results
>
> [1095429760] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095429760] Caught SIGTERM, shutting down...
> [1095429760] Nagios 1.2 starting... (PID=83831)
> [1095429760] Successfully shutdown... (PID=81306)
> [1095429760] Finished daemonizing... (New PID=83832)
>
> [1095430344] Restarting Nagios (to re-initialize embedded Perl
> interpreter) after 101 uses ...
> [1095430344] Caught SIGTERM, shutting down...
> [1095430344] Successfully shutdown... (PID=83832)
> [1095430344] Nagios 1.2 starting... (PID=86358)
> [1095430344] Finished daemonizing... (New PID=86359)
>
So... that's 10 hours between restarts. How many service checks is this
for? What happens in networks large enough for the idea of the EPN to be
really useful (+5000 services)?
> I am now testing my prod Nag with this change and a threshold of 100_000
> checks (should be about a week or a mem usage of 40-60 MB).
>
40-60 MB * 50 concurrent Nagios processes. I don't have that much RAM.
Is dropping the EPN completely out of the question? It seems to me like
it's been given a lot of work and has only gone from bad to worse, and I
for one won't enable it with the requirements you just mentioned.
> Yours sincerely.
>
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]