I wanted to give a solution that does not use a notification delay.
These settings will give you
- Realtime monitoring (in the sense that it is being monitored as frequently as possible)
A delay of 15 minutes before the notification is sent
If the program recovers in 14 minutes then no notification is sent
This is using the following settings:
check_interval = 1
max_check_attempts = 15
retry_interval = 1
1.10pm - Service is checked (we'll call it APP) and detected as OK, next check is 1.11pm
1.11pm - APP breaks, nagios does not know about it yet
1.11pm - Service check fails, retry interval is 1 so next attempt is 1.12pm (soft state) [check attempt #1]
1.12pm - Service check retry fails, retry interval is 1 so next attempt is 1.13pm (soft state) [check attempt #2]
1.13pm - Service check retry fails, retry interval is 1 so next attempt is 1.14pm (soft state) [check attempt #3]
1.14pm - Service check retry fails, retry interval is 1 so next attempt is 1.15pm (soft state) [check attempt #4]
1.15pm - Service check retry fails, retry interval is 1 so next attempt is 1.16pm (soft state) [check attempt #5]
1.16pm - Service check retry fails, retry interval is 1 so next attempt is 1.17pm (soft state) [check attempt #6]
1.17pm - Service check retry fails, retry interval is 1 so next attempt is 1.18pm (soft state) [check attempt #7]
1.18pm - Service check retry fails, retry interval is 1 so next attempt is 1.19pm (soft state) [check attempt #8]
1.19pm - Service check retry fails, retry interval is 1 so next attempt is 1.20pm (soft state) [check attempt #9]
1.20pm - Service check retry fails, retry interval is 1 so next attempt is 1.21pm (soft state) [check attempt #10]
1.21pm - Service check retry fails, retry interval is 1 so next attempt is 1.22pm (soft state) [check attempt #11]
1.22pm - Service check retry fails, retry interval is 1 so next attempt is 1.23pm (soft state) [check attempt #12]
1.23pm - Service check retry fails, retry interval is 1 so next attempt is 1.24pm (soft state) [check attempt #13]
1.24pm - Service check retry fails, retry interval is 1 so next attempt is 1.25pm (soft state) [check attempt #14]
1.25pm - Service check fails, max_check_attempts reached so alert is sent (hard state)