Hey all,
So we run a pair of VMs and our ESX team messing with the SAN cabling that caused issues with the underlying storage.
We didn't notice until I tried to access Nagios Log Server today and saw unallocated shards and the elasticsearch service saying it was waiting to start. I rebooted the VMs and allocated the shards but I lost three days of logs....
There were no critical alerts during the outage...
SIS Nagios Critical Alert Mon, 11 Mar 2019 07:42:26 -0400 critical CRITICAL: 12 matching entries found |logs=12;5;10 5m 15m
SIS Nagios Critical Alert Mon, 11 Mar 2019 07:37:12 -0400 critical CRITICAL: 12 matching entries found |logs=12;5;10 5m 15m
SIS Nagios Critical Alert Mon, 11 Mar 2019 07:32:09 -0400 warning WARNING: 6 matching entries found |logs=6;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:16:26 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:16:26 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:11:21 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:06:11 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m
SIS Nagios Critical Alert Fri, 08 Mar 2019 16:01:11 -0500 ok OK: 0 matching entries found |logs=0;5;10 5m 15m 5 10
SIS Nagios Critical Alert Fri, 08 Mar 2019 15:56:06 -0500 ok OK: 0 matching entries found |logs=0;5;10
Is there any way to set a log level for say, logs received going to ZERO and that being a critical alert?
Thanks,
Matt
Help with critical subsystem alerts
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Help with critical subsystem alerts
When you create an alert, if you hover over the ? next to the thresholds you will see a note stating to use 1: in the warning and critical values to alert if nothing is found
Re: Help with critical subsystem alerts
So if I -
Put :1 in both thresholds,
and select 'Only alert when Warning or Critical threshold is met' while putting the check and interval periods to 5 minutes,
I should get an alert saying no logs received for 5 minutes?
Matt
Put :1 in both thresholds,
and select 'Only alert when Warning or Critical threshold is met' while putting the check and interval periods to 5 minutes,
I should get an alert saying no logs received for 5 minutes?
Matt
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Help with critical subsystem alerts
Yes, but it is 1: NOT :1mkojder wrote:So if I -
Put :1 in both thresholds,
and select 'Only alert when Warning or Critical threshold is met' while putting the check and interval periods to 5 minutes,
I should get an alert saying no logs received for 5 minutes?
Matt
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Help with critical subsystem alerts
Glad to helpmkojder wrote:Thanks!