HI,
In Configure Host, monitoring settings, I don't understand the meaning of "potential problem is first detected"
In normal server monitoring, we will set every 2 minutes for checking the services. This means when services once reach the threshold value and then it will trigger the alarm.
But how to declare the "potential problem" for below statement?
When a potential problem is first detected ...
Re-check the host every 1 minutes up to 5 times before generating an alert.
Menaing of "potential problem is first dectect"
Re: Menaing of "potential problem is first dectect"
That gives you the option to re-check a host/service at a faster speed when a problem is detected to avoid false positives. People usually check 5 times with a minute between each check in order to make sure the issue wasn't temporary. This setting is what determines how fast the check runs. The "Max Check Attempts" defines how many times to re-check, and you can set this to 1 to alert immediately.
Former Nagios employee
Re: Menaing of "potential problem is first dectect"
Hi tmcdonald,
I would like to confirm setting with below example.
I set every 3 mins to check the service and
"When a potential problem is first detected ...
Re-check the host every 1 minutes up to 1 times before generating an alert."
Finally, If problem detected. Nagios will generate the alert in 3 mins or 4 mins?
I would like to confirm setting with below example.
I set every 3 mins to check the service and
"When a potential problem is first detected ...
Re-check the host every 1 minutes up to 1 times before generating an alert."
Finally, If problem detected. Nagios will generate the alert in 3 mins or 4 mins?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Menaing of "potential problem is first dectect"
Here's how the interval and retry settings work in a scenario:
Check Interval: 2m
Retry Interval: 1m
Number of Retries: 5
1.01 Nagios checks service, service is OK, next check is 1.03, attempt 1/5
1.03 Nagios checks service, service is OK, next check is 1.05, attempt 1/5
1.03.30 service breaks somehow, Nagios does not know about it yet
1.05 Nagios checks service, detects thresholds have been tiggered, SOFT state, NEXT check 1.06, attempt 1/5
1.06 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.07, attempt 2/5
1.07 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.08, attempt 3/5
1.08 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.09, attempt 4/5
1.09 Nagios checks service, thresholds still tiggered, HARD state, notifications sent, NEXT check 1.10, attempt 5/5
So it's only when the service reaches the number of retries before it enters a HARD state and starts sending notifications.
Check Interval: 2m
Retry Interval: 1m
Number of Retries: 5
1.01 Nagios checks service, service is OK, next check is 1.03, attempt 1/5
1.03 Nagios checks service, service is OK, next check is 1.05, attempt 1/5
1.03.30 service breaks somehow, Nagios does not know about it yet
1.05 Nagios checks service, detects thresholds have been tiggered, SOFT state, NEXT check 1.06, attempt 1/5
1.06 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.07, attempt 2/5
1.07 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.08, attempt 3/5
1.08 Nagios checks service, thresholds still tiggered, SOFT state, NEXT check 1.09, attempt 4/5
1.09 Nagios checks service, thresholds still tiggered, HARD state, notifications sent, NEXT check 1.10, attempt 5/5
So it's only when the service reaches the number of retries before it enters a HARD state and starts sending notifications.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Menaing of "potential problem is first dectect"
So how to disable "retry interval" because we would like to get the alert when first error trigger.
Some error eg. system log messages just trigger 1 times or it will wait a long time for generate 2nd same error.
Some error eg. system log messages just trigger 1 times or it will wait a long time for generate 2nd same error.
Re: Menaing of "potential problem is first dectect"
From our documentation: http://nagios.sourceforge.net/docs/nagi ... tions.html
max_check_attempts: This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the check_command option blank.
Re: Menaing of "potential problem is first dectect"
Hi jolson,
Thanks for your answer.
Thanks for your answer.
Re: Menaing of "potential problem is first dectect"
No problem - would it be alright if I locked this thread and marked as resolved?
Re: Menaing of "potential problem is first dectect"
sorry for lately reply. Sure. Case can be closed and thank your for your helping