[Nagios-devel] New Nagios implementation proposal
Posted: Tue Dec 01, 2009 5:12 pm
--000e0cdf9422ca91650479add344
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi list,
I would like to have your feed back about a (unfinished)
reimplementation of Nagios named "Shinken" I wrote in Python that is
faster and more modular than the current Nagios implementation in C
(yes faster, you read correctly. I was the first surprised by that).
=3D=3D The Shinken's history =3D=3D
Few months, I start to work on a proof of concept for Nagios focus on
distributed environments and performances. The main goal was to look
for a distributed and high availability architecture. I was also
thinking that Nagios' performances were quite good, but we can have
more.
For quick test and development, I used Python. I thought a process
pool can make Nagios be quicker instead of forking a new process to
kill it few seconds after for each checks. I also bypass the reaping
way of Nagios : reading flat file is just too slow. Instead, the
results are a structure that is send directly to the scheduler. No
files, more performances. To be equal to Nagios, I add the same
monitoring logic in the scheduler : HARD/SOFT states, dependencies
(parents, servicedep, hostdep, etc) and database export (Merlin).
Shinken used the standard Nagios conf file.
And the perf are quite good : with a Nagios3, a small check (do a
echo + exit) and a medium range server I run at 10000 checks in
5minutes (latency near 1s), 30K with full tweaks. With my tool, I run
150K !!
=3D=3D The global architecture =3D=3D
For the Architecture, I think we must use the Unix Way of doing things
: one tool by usage. For now, Nagios do nearly every things : reads
conf, schedule, launch checks and raise notifications. I try an
architecture where the administrator can have any host/services he
wants and the daemons are just resources to manage this. The
architecture I propose is the following :
*Arbiter : a daemon that read the configuration, cut it automatically
(keep relations like parents in the same conf) in N confs, where N is
the number of schedulers we have. It dispatchs the configuration and
also read the orders in nagios.cmd and dispatch orders to schedulers.
*Schedulers : do the scheduling by looking at states of
hosts/services. It just do checks/notifications/event handlers queues
for others daemons. Same things for event broker informations : it's
just a queue.
*pollers : use a processes Pool, get checks to launch in schedulers
and returns results to schedulers.
*reactionners : same than pollers, but for notifications and event handlers=
.
*brokers : get event broker informations from schedulers and "do
things" why them (like create the service-perfdata file, or fill
databases).
The poller way of doing is like DNX, nothing new here. The
reactionners allow the administrators to have a unique daemon to send
all notifications of all his schedulers (usefull for SMTP
authorizations or the fill of a unique RSS file with all
notifications). The schedulers do not launch checks, so they do not
get latency when they launch notifications or event handlers.
The load balancing is automatic : the arbiter cuts the conf and
dispatch thems. For the high availability : there can be spare daemons
: if a daemon die, another take it's configuration (the Arbiter "ping"
daemons, and if a daemon failed, it just send the configuration to a
spare). The daemon are reach by network, so all daemons can be in
different servers (and it's better for high availability to not put
all daemons in the same server
). For now, the Arbiter do not have
a spare, but it will be add in the future.
You can see this Architecture in the file shinken-architecture.png.
If the user configuration do not defined such daemons, Shinken
automatically create defaults one (in localhost with default ports).
=3D=3D Advanced architecture =3D=3D
In the architecture we saw, all reactionners/pollers/brokers take
orders from ALL schedulers. It can be a problem with reactionners
(with 3 SMTP servers (USA, Europe, Asia), it's hard to forced Asia
notifications to go in the Asia SMTP s
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi list,
I would like to have your feed back about a (unfinished)
reimplementation of Nagios named "Shinken" I wrote in Python that is
faster and more modular than the current Nagios implementation in C
(yes faster, you read correctly. I was the first surprised by that).
=3D=3D The Shinken's history =3D=3D
Few months, I start to work on a proof of concept for Nagios focus on
distributed environments and performances. The main goal was to look
for a distributed and high availability architecture. I was also
thinking that Nagios' performances were quite good, but we can have
more.
For quick test and development, I used Python. I thought a process
pool can make Nagios be quicker instead of forking a new process to
kill it few seconds after for each checks. I also bypass the reaping
way of Nagios : reading flat file is just too slow. Instead, the
results are a structure that is send directly to the scheduler. No
files, more performances. To be equal to Nagios, I add the same
monitoring logic in the scheduler : HARD/SOFT states, dependencies
(parents, servicedep, hostdep, etc) and database export (Merlin).
Shinken used the standard Nagios conf file.
And the perf are quite good : with a Nagios3, a small check (do a
echo + exit) and a medium range server I run at 10000 checks in
5minutes (latency near 1s), 30K with full tweaks. With my tool, I run
150K !!
=3D=3D The global architecture =3D=3D
For the Architecture, I think we must use the Unix Way of doing things
: one tool by usage. For now, Nagios do nearly every things : reads
conf, schedule, launch checks and raise notifications. I try an
architecture where the administrator can have any host/services he
wants and the daemons are just resources to manage this. The
architecture I propose is the following :
*Arbiter : a daemon that read the configuration, cut it automatically
(keep relations like parents in the same conf) in N confs, where N is
the number of schedulers we have. It dispatchs the configuration and
also read the orders in nagios.cmd and dispatch orders to schedulers.
*Schedulers : do the scheduling by looking at states of
hosts/services. It just do checks/notifications/event handlers queues
for others daemons. Same things for event broker informations : it's
just a queue.
*pollers : use a processes Pool, get checks to launch in schedulers
and returns results to schedulers.
*reactionners : same than pollers, but for notifications and event handlers=
.
*brokers : get event broker informations from schedulers and "do
things" why them (like create the service-perfdata file, or fill
databases).
The poller way of doing is like DNX, nothing new here. The
reactionners allow the administrators to have a unique daemon to send
all notifications of all his schedulers (usefull for SMTP
authorizations or the fill of a unique RSS file with all
notifications). The schedulers do not launch checks, so they do not
get latency when they launch notifications or event handlers.
The load balancing is automatic : the arbiter cuts the conf and
dispatch thems. For the high availability : there can be spare daemons
: if a daemon die, another take it's configuration (the Arbiter "ping"
daemons, and if a daemon failed, it just send the configuration to a
spare). The daemon are reach by network, so all daemons can be in
different servers (and it's better for high availability to not put
all daemons in the same server
a spare, but it will be add in the future.
You can see this Architecture in the file shinken-architecture.png.
If the user configuration do not defined such daemons, Shinken
automatically create defaults one (in localhost with default ports).
=3D=3D Advanced architecture =3D=3D
In the architecture we saw, all reactionners/pollers/brokers take
orders from ALL schedulers. It can be a problem with reactionners
(with 3 SMTP servers (USA, Europe, Asia), it's hard to forced Asia
notifications to go in the Asia SMTP s
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]