Page 1 of 1

Re: [Nagios-devel] Nagios acknowledgement enhancement request

Posted: Thu Nov 13, 2008 6:31 pm
by Guest
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jim Winkle wrote:
> On Wed, 12 Nov 2008 at 6:22pm, Thomas Guyot-Sionnest wrote:
>> On 12/11/08 04:45 PM, Jim Winkle wrote:
>>> Hi,
>>>
>>> I have a suggestion for a future enhancement of Nagios.
>>>
>>> In short, I'd like there to be a way to have Nagios send notifications
>>> until we acknowledge a problem -- for certain unique plugins -- without
>>> ignoring future problems. Background and more details follow.
>>>
>>> We're using the check_logfiles plugin to monitor syslogs (e.g.
>>> /var/adm/messages on Solaris). check_logfiles returns CRITICAL when it
>>> detects a problem, but then normally clears itself (returns OK) the next
>>> time it runs. Nagios notifies us only once under this scenerio, and since
>>> it's possible that pagers might miss just one page (paging services aren't
>>> 100% reliable), we'd rather get notified until we explicitly acknowledge
>>> the problem.
>>>
>>> The check_logfiles plugin does have the capability to continue to report the
>>> error (using its "sticky" option). This is good since then we're notified
>>> longer, but if we then use the Nagios "Acknowledge" link to acknowledge the
>>> problem, new problems (e.g. new errors in /var/adm/messages) reported by the
>>> check_logfiles plugin get ignored.
>>>
>>> I asked on the nagios-users list if there was a way to acknowledge a problem
>>> reported by a plugin like check_logfiles without ignoring future problems.
>>> Nobody came up with a way, so I assume this is new functionality needed in
>>> Nagios.
>>>
>>> I realize we can syslog an "okpattern" string and check_logfiles will then
>>> clear, but I'm looking for something using the Nagios web (and external
>>> command_file) interfaces. Using the Nagios "Acknowledge" link would be ideal,
>>> since that's what folks are going to be using to acknowledge other problems.
>>>
>>> I'm using Nagios version 3.0.5 and check_logfiles version 2.4.1.3. We configure
>>> check_logfiles as a volatile service and use state staulking.
>>>
>>> Thanks for providing these great tools! Please let me know if something doesn't
>>> make sense or if I'm missing something.
>> You could most likely achieve what you want with adaptive monitoring.
>> When the service goes to HARD CRITICAL, run an event handler that change
>> the service command to a dummy critical check. To change it back you
>> could either submit a passive check that triggers the event handler to
>> re-apply the check command, or use a dummy contact whose notification
>> command do it upon receiving an acknowledgement.
>>
>>
>> Some useful links:
>> http://nagios.sourceforge.net/docs/3_0/ ... dlers.html
>> http://nagios.sourceforge.net/docs/3_0/adaptive.html
>> http://www.nagios.org/developerinfo/ext ... ndlist.php
>
> Adaptive monitoring... interesting, dynamic... a little complicated, but
> I'll think about going that route. Thanks for that response.
>
> Nonetheless, it would still be cool if the Acknowledge function could
> handle unique plugins like check_logfiles. I can think of two ways this
> could be done:
>
> 1) Using check_logfiles "sticky" option (the plugin continues to report
> the problem): If Nagios would store the string that the plugin returned
> when a user clicks "Acknowledge", then if the plugin returns a *new*
> CRITICAL string, Nagios would go thru it's notification routine, run event
> handlers, etc. When the user again clicks "Acknowledge", Nagios stores this
> new string (discarding the old) to be ready for the next problem. Pretty
> simple from a user standpoint.

You should rather implement that in your plugin. You can easily pass the
check output and/or performance data back to the next check. I did it
for a Windows CSV perfmon log counter check to monitor incremental
counters (I think I forgot to release it... anyone interested?) and I'd
like to make something similar for check_snmp.

> 2) Not using check_logfiles "sticky" option (the plugin fires just once):
> Create a new option like is_volatile called is_transient. Transient services
> would differ from

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]