Page 1 of 1

Composite rule for Alert

Posted: Tue May 26, 2015 7:52 am
by maddev
Hi,

Is it possible to use the results of multiple checks to declare a a service to be warning or critical. I really need to implement this.

for example,

lets say

-I want the swap to be marked warning only if its beyond 80% for 15 minutes and then want it to go to critical it it stays beyond 80% for 15 more minutes after attaining warning

-I want to alert only if physical memory is more than 9o% for like 30 minutes and swap usage is above 80% throughout this period. How to go about doing this.
This will help us prevent a lot of false positives and help the admin to focus on real issue.

Re: Composite rule for Alert

Posted: Tue May 26, 2015 9:11 am
by tmcdonald
For this, a simple set of retry intervals and max check attempts should suffice.

If you want a 15 minute check, then you can set it to check every 3 minutes up to 5 times before alerting as warning or critical. This is set in the CCM under the Check Settings tab for a host or service.

Re: Composite rule for Alert

Posted: Wed May 27, 2015 1:32 am
by maddev
So the following setting should alert after 15 minutes of breach. Correct me if wrong. - check interval=3 minutes, retry interval=1 and max check attempts=5

Also anyway to achieve the second scenarios that I mentioned, to alert based on combination of two service checks.

Code: Select all

I want to alert only if physical memory is more than 9o% for like 30 minutes and swap usage is above 80% throughout this period. How to go about doing this.
This will help us prevent a lot of false positives and help the admin to focus on real issue.

Re: Composite rule for Alert

Posted: Wed May 27, 2015 9:15 am
by tmcdonald
maddev wrote:check interval=3 minutes, retry interval=1 and max check attempts=5
This should be "check interval=X minutes, retry interval=3 and max check attempts=5"

The retry is what we are concerned with, since that controls the timing once a problem state is detected. Check interval can be basically whatever, but the lower you set it the more accurate you will be. As an extreme example, setting it to 1440 means it will check once per day, so you could potentially have a problem going for 24 hours before it is detected.

Re: Composite rule for Alert

Posted: Wed May 27, 2015 1:09 pm
by maddev
This makes sense, I get it clearly now. Thank you.


Anything on my other query?

Re: Composite rule for Alert

Posted: Wed May 27, 2015 4:36 pm
by jdalrymple
Wrappers in Nagios are simple. Here is my canned logic wrapper, modify to suit your needs. It does nothing but ANDs 2 check results. It gives you an idea of how to build something more robust to suit your needs though:

Code: Select all

#!/bin/bash

nrpe="/usr/local/nagios/libexec/check_nrpe"
hostname="127.0.0.1"
check_one="$1"
check_two="$2"

$nrpe -H $hostname -c $check_one
return_one=$?

$nrpe -H $hostname -c $check_two
return_two=$?

output=3

if [ "$return_one" -ge 0 ] && [ "$return_two" -ge 0 ]; then
        output=0
        status="OK"
fi
if [ "$return_one" -ge 1 ] && [ "$return_two" -ge 1 ]; then
        output=1
        status="WARNING"
fi
if [ "$return_one" -eq 2 ] && [ "$return_two" -eq 2 ]; then
        output=2
        status="CRITICAL"
fi

echo $status
exit $output