Page 1 of 1

Help with critical subsystem alerts

Posted: Mon Mar 11, 2019 9:19 am
by mkojder
Hey all,

So we run a pair of VMs and our ESX team messing with the SAN cabling that caused issues with the underlying storage.

We didn't notice until I tried to access Nagios Log Server today and saw unallocated shards and the elasticsearch service saying it was waiting to start. I rebooted the VMs and allocated the shards but I lost three days of logs....

There were no critical alerts during the outage...

SIS Nagios Critical Alert Mon, 11 Mar 2019 07:42:26 -0400 critical CRITICAL: 12 matching entries found |logs=12;5;10 5m 15m
SIS Nagios Critical Alert Mon, 11 Mar 2019 07:37:12 -0400 critical CRITICAL: 12 matching entries found |logs=12;5;10 5m 15m
SIS Nagios Critical Alert Mon, 11 Mar 2019 07:32:09 -0400 warning WARNING: 6 matching entries found |logs=6;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:16:26 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:16:26 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:11:21 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:06:11 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:01:11 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m 5 10
SIS Nagios Critical Alert Fri, 08 Mar 2019 15:56:06 -0500 ok OK: 0 matching entries found |logs=0;5;10

Is there any way to set a log level for say, logs received going to ZERO and that being a critical alert?

Thanks,

Matt

Re: Help with critical subsystem alerts

Posted: Mon Mar 11, 2019 10:01 am
by scottwilkerson
When you create an alert, if you hover over the ? next to the thresholds you will see a note stating to use 1: in the warning and critical values to alert if nothing is found

Re: Help with critical subsystem alerts

Posted: Mon Mar 11, 2019 10:55 am
by mkojder
So if I -

Put :1 in both thresholds,

and select 'Only alert when Warning or Critical threshold is met' while putting the check and interval periods to 5 minutes,

I should get an alert saying no logs received for 5 minutes?

Matt

Re: Help with critical subsystem alerts

Posted: Mon Mar 11, 2019 10:57 am
by scottwilkerson
mkojder wrote:So if I -

Put :1 in both thresholds,

and select 'Only alert when Warning or Critical threshold is met' while putting the check and interval periods to 5 minutes,

I should get an alert saying no logs received for 5 minutes?

Matt
Yes, but it is 1: NOT :1

Re: Help with critical subsystem alerts

Posted: Mon Apr 01, 2019 10:36 am
by mkojder
Thanks!

Re: Help with critical subsystem alerts

Posted: Mon Apr 01, 2019 10:44 am
by scottwilkerson
mkojder wrote:Thanks!
Glad to help