Re: [Nagios-devel] A different way?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] A different way?

Post by Guest »

It's very similar.
I call it semi-passive (although someone mentioned passive-aggressive might=
be a better name for it).
You still have an active nagios instance and it's still checking to make su=
re checks did execute on time (similar to active), it's just not doing the =
actual execution anymore (similar to passive), and instead of processing th=
e "meaning" of the results of a check it would just process the outcome as =
directed by the rules for the host/service being monitored.

Let me give a for instance...
Under my current setup I dispatch a check to a DNX worker node, the check e=
xecutes and the result is handed wholesale back to Nagios.
Nagios parses the result, and tries to divine if the service is up, down, f=
lapping etc and then takes appropriate action.
Here's a breakdown of where time is spent.

Nagios event loop approx 0.07s handing service check to DNX
DNX average of 3 seconds round trip
Nagios up to 10 seconds to process the result depending on how many depende=
ncies are involved, and as much as 30 seconds if a host check is required.

Now obviously this is because all of my service checks are active and not p=
assive and I have 3,000 hosts and 30,000 service checks

Under the proposed design it would look more like this.

Nagios initializes and pushes all schedule pieces to all hosts.
Next nagios enters a passive mode where it listens for results, and audit m=
ode where it watches the schedule looking for results that haven't come in =
yet.
On the flip side the execution daemon is running on each host and it's exec=
uting the checks, determining what is meant by the check "service up/down f=
lapping etc" and passes that meaning back to nagios which subsequently take=
s the appropriate action.
All the while the auditor is watching for checks that were scheduled but ha=
ven't come in yet, and contacting hosts to find out whats up etc.

So really in some ways this is an expansion of the current passive model fo=
r checks, but in some ways this is a whole new model (compared to what we d=
o now anyways)=20

Those are my thoughts on the matter, what do you think?

Sincerely,
Steve
=20
________________________________________
From: hemebond [[email protected]]
Sent: Friday, September 25, 2009 2:19 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] A different way?

Isn't this the same as using passive checks? It sounds like what I've set u=
p. I wrote a simple agent (script) that has its own schedule and runs the c=
hecks, sending the result back to a Nagios server.

2009/9/25 Steven D. Morrey >
Hello everyone,

I've decided to take a break for a bit from multi-threading nagios to focus=
on DNX since that is my day job after all :)
While working on all of this I had a few thoughts that might make some good=
ideas if Nagios is ever re-designed again, say for a 4.x branch.

As you know, under nagios, all checks are dispatched by nagios to be execut=
ed on the local machine at set intervals.
Under a distributed nagios setup, you have multiple nagios instances runnin=
g on various machines executing checks and passing the results back to a pa=
ssive master controller.

Under DNX, we distribute the load to "worker nodes" which then execute the =
checks and hand the results back to an active master controller that then p=
rocesses the result etc.

Profiling shows that (under DNX at least) 2/3rds of our time is spent in th=
e reaper processing results, so wouldn't it make more sense to flip the pr=
ocess around?

The checks are already executing on the local machine, so how about a daemo=
n on each machine, the daemon would keep the schedule and execute service c=
hecks locally, processing the result and returning the results and the requ=
ired actions (based on a local policy) to nagios which would then do the ac=
tual work of handling notifications etc and so forth.
This way nagios could be an auditor, if it doesn't receive a result on time=
as expected, then it could query the daemon to see whats gone wrong, if th=
at fails then it could initiate

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: emebond [[email protected]
Locked