Re: [Nagios-devel] Adding more advanced correlation to nagios with sec (any interest?)

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Adding more advanced correlation to nagios with sec (any interest?)

Post by Guest »

Dear Sir,

I am writing to thank you for your well conceived and expressed letter
and say,

On Sat, Jun 28, 2003 at 03:48:16PM -0400, John P. Rouillard wrote:

> However, I have some things that I want to do that are not easily
> done within nagios. E.G.
>
> If a system jumpstart is in progress, ignore warnings about high
> interface usage (on one interface), or dropped packets (on the
> hub).
>
> If an index operation of the HTTP server is in progress, ignore
> warnings about the http interface being slow.
>
> I also want to show a host/service down if a given system went down,
> (as determined by a syslog message) but I want the report done
> ONLY if the system isn't back up in 5 minutes.
>
> Note that none of the rebooting, indexing, or jumpstarting operations
> occur at fixed times, so I can't schedule these in advance.
>

that this, as you say, demonstrates the case for Nagios being able to
provide better event correlation than it does now.

However, please would you spell what events and their origin are
correlated by Sec to avoid spurious alarms in these cases - especially
the first two. Is Sec correlating plugin failures with syslog messages ?

> Other things can sort of be done in nagios, but it is a bit tough to
> configure. E.G. I have a single snmp_trap service defined for my
> hosts. The service is considered volatile, and is state_stalked. I
> want to do the following:
>
> If an (particular range of) interfaces on a switch goes down (and
> sends a trap) ignore it unless it has gone down/up 3 times in
> five minutes. Don't clear it until it has stayed up for at least
> 15 minutes.
>
> Other interfaces on the same switch should be reported immediately
>
> I could do part of this by adding every one of my 20 interfaces on the
> switch as services, but that doesn't really handle the timing aspects.
> It makes the services a lot more difficult to read and configure.
>
> Another thing I want to do is:
>
> Synthesize an event that notes if two of my three links to
> a remote site are having problems. That is two of my three
> routers may be in a warn state, and I want to place the
> "Access to 16 net" service in a critical state.
>
> This can be done by event handlers, but you end up writing a portion
> of sec to do it, so you might just as well use sec in the first place.
>

Agreed.

> I have a method of integrating sec
> into nagios to handle these issues and more.
>
> Using sec to process traps (or other passive checks) is straight
> forward. The trap collector running from snmptrapd just dumps the trap
> report (formatted as a nagios passive service check) into sec's input
> fifo and then sec processes it, and reports it (if needed) into the
> nagios.cmd pipe.
>

And a very attractive means of handling SNMP traps it is too.

Sec has become for me, the standard way of providing event and trap
handlers.

For example, I have a general host and service handler that updates a
MySQL DB with the outage interval. To do this it must retain state (and
does so with a Perl hash tied to a DB file) so it can determine if there
has been a transition and if so, how long it was.

This would probably be easier to do with Sec contexts.

> However for polled items, it more difficult. I don't want to have a
> flapping service where the plugin determines that there is a problem,
> nagios reacts to that, and then sec reacts to that (being fed its info
> by an event handler) by clearing the service because sec determines
> that there is not yet a problem. This leads to a flapping service as
> nagios and sec disagree on what is a true problem, and leads to
> spurious notifications because I can't put in a high
> max_check_attempts and have nagios respond to sec when it has a real
> problem (unless I define yet another service yech).
>
> What I did was write a plugin in perl (sec_filter) that runs the
> nagios command (sort of like check_ssh). It always passes the output
> of the plugin to sec's input pipe. However, depending o

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked