Re: [Nagios-devel] Problems with many hanging Nagios processes
Posted: Mon Dec 18, 2006 4:42 pm
=09
We had similar issue. We have a distributed environment with one=
master and 4 slaves. Total number of hosts monitored are 1900+=
and
20000+ services spread across 4 slaves.
At times we saw 14K or more results being sent in a second from=
slaves. This resulted in 100+ nagios processes being created.
Changed reaper frequency to 2 seconds and played with all tunables.
Nothing seemed to help.
Looking at the nagios source,
This is what I found out was happening...
Nagios has a commands file worker thread and when it gets woken=
up, looks if there is data in pipe(nagios.cmd), if exists, forks=
a child process. This will be in a loop and checks the pipe for=
data.
Now what does the forked nagios child process do?
It reads all the data from the pipe one message a time and puts=
it in commands buffer. If if is able to write to buffer, just exits.
The problem here was command buffer had a limited size of 1024.=
This is the default setting in include/nagios.h.in and is in the=
line #define COMMAND_BUFFER_SLOTS 1024.
This was not enough and the child process started to wait for memory=
to be freed so that the pipe data retrieved can be put in buffer.
While this child process waited for memory to be freed, the command=
worker thread got woken up and realized that there is data in pipe=
and forked another child. This got repeated and eventually server=
went out of memory.
Here is what we did to resolve.
1. Edit the include/nagios.h.in
change
#define COMMAND_BUFFER_SLOTS 1024
to
#define COMMAND_BUFFER_SLOTS 60000
And change
#define SERVICE_BUFFER_SLOTS 1024
to
#define SERVICE_BUFFER_SLOTS 60000
2. Run ./configure
(make sure you don't have nano second sleep enabled. Also disable=
perl
interpreter)
3. make all;make install
- Mahesh Kunjal (maheshk)
-----------------------
This thread is located in the archive at this URL:
http://www.nagiosexchange.org/nagios-de ... ttofaq_pi=
1[showUid]=3D13177
=09
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
We had similar issue. We have a distributed environment with one=
master and 4 slaves. Total number of hosts monitored are 1900+=
and
20000+ services spread across 4 slaves.
At times we saw 14K or more results being sent in a second from=
slaves. This resulted in 100+ nagios processes being created.
Changed reaper frequency to 2 seconds and played with all tunables.
Nothing seemed to help.
Looking at the nagios source,
This is what I found out was happening...
Nagios has a commands file worker thread and when it gets woken=
up, looks if there is data in pipe(nagios.cmd), if exists, forks=
a child process. This will be in a loop and checks the pipe for=
data.
Now what does the forked nagios child process do?
It reads all the data from the pipe one message a time and puts=
it in commands buffer. If if is able to write to buffer, just exits.
The problem here was command buffer had a limited size of 1024.=
This is the default setting in include/nagios.h.in and is in the=
line #define COMMAND_BUFFER_SLOTS 1024.
This was not enough and the child process started to wait for memory=
to be freed so that the pipe data retrieved can be put in buffer.
While this child process waited for memory to be freed, the command=
worker thread got woken up and realized that there is data in pipe=
and forked another child. This got repeated and eventually server=
went out of memory.
Here is what we did to resolve.
1. Edit the include/nagios.h.in
change
#define COMMAND_BUFFER_SLOTS 1024
to
#define COMMAND_BUFFER_SLOTS 60000
And change
#define SERVICE_BUFFER_SLOTS 1024
to
#define SERVICE_BUFFER_SLOTS 60000
2. Run ./configure
(make sure you don't have nano second sleep enabled. Also disable=
perl
interpreter)
3. make all;make install
- Mahesh Kunjal (maheshk)
-----------------------
This thread is located in the archive at this URL:
http://www.nagiosexchange.org/nagios-de ... ttofaq_pi=
1[showUid]=3D13177
=09
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]