Page 1 of 1

Menaing of "potential problem is first dectect"

Posted: Thu Mar 12, 2015 11:17 pm
by michaelli
HI,

In Configure Host, monitoring settings, I don't understand the meaning of "potential problem is first detected"

In normal server monitoring, we will set every 2 minutes for checking the services. This means when services once reach the threshold value and then it will trigger the alarm.

But how to declare the "potential problem" for below statement?

When a potential problem is first detected ...
Re-check the host every 1 minutes up to 5 times before generating an alert.

Re: Menaing of "potential problem is first dectect"

Posted: Fri Mar 13, 2015 9:34 am
by tmcdonald
That gives you the option to re-check a host/service at a faster speed when a problem is detected to avoid false positives. People usually check 5 times with a minute between each check in order to make sure the issue wasn't temporary. This setting is what determines how fast the check runs. The "Max Check Attempts" defines how many times to re-check, and you can set this to 1 to alert immediately.

Re: Menaing of "potential problem is first dectect"

Posted: Sat Mar 14, 2015 1:07 am
by michaelli
Hi tmcdonald,

I would like to confirm setting with below example.

I set every 3 mins to check the service and

"When a potential problem is first detected ...
Re-check the host every 1 minutes up to 1 times before generating an alert."

Finally, If problem detected. Nagios will generate the alert in 3 mins or 4 mins?

Re: Menaing of "potential problem is first dectect"

Posted: Sun Mar 15, 2015 10:15 pm
by Box293
Here's how the interval and retry settings work in a scenario:

Check Interval: 2m
Retry Interval: 1m
Number of Retries: 5

1.01 Nagios checks service, service is OK, next check is 1.03, attempt 1/5
1.03 Nagios checks service, service is OK, next check is 1.05, attempt 1/5
1.03.30 service breaks somehow, Nagios does not know about it yet
1.05 Nagios checks service, detects thresholds have been tiggered, SOFT state, NEXT check 1.06, attempt 1/5
1.06 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.07, attempt 2/5
1.07 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.08, attempt 3/5
1.08 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.09, attempt 4/5
1.09 Nagios checks service, thresholds still tiggered, HARD state, notifications sent, NEXT check 1.10, attempt 5/5

So it's only when the service reaches the number of retries before it enters a HARD state and starts sending notifications.

Re: Menaing of "potential problem is first dectect"

Posted: Mon Mar 16, 2015 1:48 am
by michaelli
So how to disable "retry interval" because we would like to get the alert when first error trigger.
Some error eg. system log messages just trigger 1 times or it will wait a long time for generate 2nd same error.

Re: Menaing of "potential problem is first dectect"

Posted: Mon Mar 16, 2015 10:39 am
by jolson
From our documentation: http://nagios.sourceforge.net/docs/nagi ... tions.html
max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the check_command option blank.

Re: Menaing of "potential problem is first dectect"

Posted: Mon Mar 23, 2015 4:21 am
by michaelli
Hi jolson,

Thanks for your answer.

Re: Menaing of "potential problem is first dectect"

Posted: Mon Mar 23, 2015 9:32 am
by jolson
No problem - would it be alright if I locked this thread and marked as resolved?

Re: Menaing of "potential problem is first dectect"

Posted: Mon Mar 30, 2015 9:50 pm
by michaelli
sorry for lately reply. Sure. Case can be closed and thank your for your helping