Monitoring thresholds and alerts

bramassendorp · Post by **bramassendorp** » Thu Mar 11, 2021 1:17 pm

Hi,

Small question, how do you define threshold on monitoring values and when do you decide to send an alert. I'm trying to define my values so I dont get spammed by alerts but still being pro active to end users.

Most service test are for CPU, Memory and disk space.
And Citrix as a user environment.
Also running exchange and sql (mssql, msql, oracle).

Thank you.

dchurch · Post by **dchurch** » Thu Mar 11, 2021 5:57 pm

Notification Interval in the service or host definition inside the Configure (top menu) -> Core Config Manager will set the threshold for sending a new notification. Click on the Alert tab.

There's also First notification delay to delay sending out the first notification.

Also, most checks accept "critical" and "warning" thresholds with lower and upper bounds in the form of [LOWER]:[UPPER]. That is, if critical=1:5, then if it's outside the range of 1-5 inclusive (for example 0 or 7), it'll consider it "critical." When UPPER is empty, it assumed it's Infinity. Likewise, if LOWER is empty, it's assumed to be 0.

Most of the time, you can disable the critical and warning by simply not specifying them. But if you're dealing with a check script that has a default critical or warning threshold if not specified, you can use the value critical=0: to disable the check from going critical.

bramassendorp · Post by **bramassendorp** » Fri Mar 12, 2021 2:03 am

dchurch wrote:Notification Interval in the service or host definition inside the Configure (top menu) -> Core Config Manager will set the threshold for sending a new notification. Click on the Alert tab.

There's also First notification delay to delay sending out the first notification.

Also, most checks accept "critical" and "warning" thresholds with lower and upper bounds in the form of [LOWER]:[UPPER]. That is, if critical=1:5, then if it's outside the range of 1-5 inclusive (for example 0 or 7), it'll consider it "critical." When UPPER is empty, it assumed it's Infinity. Likewise, if LOWER is empty, it's assumed to be 0.

Most of the time, you can disable the critical and warning by simply not specifying them. But if you're dealing with a check script that has a default critical or warning threshold if not specified, you can use the value critical=0: to disable the check from going critical.

Hi,

Thank you for reply'ing, but I was more interested in how other users are setting this up. For example, when do you exactly alert with CPU tests at 100% / 95% and how long should it be on that value, alerting at 1 minute, or after 5 minutes?

I'm looking for best practices and examples how other have set this up in Nagios.

dchurch · Post by **dchurch** » Fri Mar 12, 2021 5:06 pm

It takes some finesse and tuning, and ultimately knowledge about what constitutes an error. Every monitoring situation is different; CPU usage on a web server means something completely different from CPU usage on a Windows workstation, means something different from a server that runs CPU-intensive compiler jobs.

It can be useful to even turn off critical and warning thresholds for CPU usage, and just use the monitoring metric to cross-reference times when the server had an error or became unresponsive.

bramassendorp wrote:I'm looking for best practices and examples how other have set this up in Nagios.

Perhaps you'll want to ask this question over on the Community Support Forum. If you have any specific questions about how to change your thresholds or notifications, I can help.

Nagios Support Forum

Monitoring thresholds and alerts

Monitoring thresholds and alerts

Re: Monitoring thresholds and alerts

Re: Monitoring thresholds and alerts

Re: Monitoring thresholds and alerts