Hi.
I'm trying to setup in Nagios, a means to monitor the completion of some Helpdesk checks that are carried out each day manually. The HD member completes the check, whatever it may be, and submits a passive check in Nagios to record it as being complete.
Could somebody advise if this is the best way to do it:
1. Hosts are the days of the week, with active checks disabled and a passive check submitted once when created to mark it as OK and UP.
2. Services on each host are the daily checks that need to be done that day. These have active checks and passive checks enabled.
3. The services check command is a simple script I wrote that checks to see if the current day of the week matches the day of the check, if it does it returns CRITICAL, if not then OK.
4. Check period is 23:45-24:00 the previous day and 00:00-00:15 the current day, meaning the checks start at quarter to midnight and return OK, when the check runs after midnight it gives us the CRITICAL, so when the Helpdesk arrive in the morning the required tasks are already set to CRITICAL.
5. Passive service check is submitted by HD staff when the check has been done and it then remains OK untill the following week.
This works. The tricky part is getting the notifications right. This is what I want from them:
1. Delayed notification is sent to the Helpdesk at 08:30, giving them 30 minutes to do the checks before getting nagged.
2. If still CRITICAL @ 13:00, send email to HD Manager. I'm having trouble with this in the fact that, to use escalations it must be generating more than one notification, which by design this isn't, because each time the active check is run, it would change state to CRITICAL and override the passive OK that might have been submitted, so in my mind this just wouldnt work.
I've set a a ''first notification delay" option of 510 minutes. In my mind that is 8.5 hours from the last known OK state, which would be midnight or shortly before, meaning an email sent at around 08:30. This doesn't work, infact it's not sending ANY notifications at all. Have I got the logic wrong here?
So to clarify, I'm after help with the notification delays, the escalations (if even possible!?) and generally wether I'm doing this kind of scenario in the correct manner. If anybody has any better ideas or suggestions, I would be so grateful to you!
Many thanks,
Malcolm
Using Nagios to monitor completion of daily Helpdesk tasks
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Using Nagios to monitor completion of daily Helpdesk tas
It sounds like you have a pretty good handle on what is happening here. However in reference to your escalation\notification issue, I would suggest trying freshness checks. This document does specifically relate to nagios XI however, the same ideas and configuration can be applied to core. You could do a couple things.
1) If there is a way to verify the task has been done, apply a freshness check to your 1pm deadline and if it fails use escalations to forward an email to the manager.
2) If there is no way to verify it has been done, use check_dummy with arg1 as 2, to return critical. If the HD person has completed the task and submitted the result it should not run this check, and avoid alerting people.
The potential issue with both of these is that freshness goes from the last time a check was submitted, so it may get somewhat offset depending on when the passive results were last passed in. You may run an hourly freshness check instead, and use a helper script to call check dummy if it is within your timeframe and the HD person has not submitted within the timeframe or depending on the last host state.
Nagios Macros - Specifically you might want to pass a helper script the $LASTSERVICESTATEID$, $LASTSERVICESTATE$, $SERVICEDURATIONSEC$, or $SERVICEDURATION$ depending on how you would like to do various logic.
Configuring passive checks within XI - Specifically look towards the bottom regarding freshness.
1) If there is a way to verify the task has been done, apply a freshness check to your 1pm deadline and if it fails use escalations to forward an email to the manager.
2) If there is no way to verify it has been done, use check_dummy with arg1 as 2, to return critical. If the HD person has completed the task and submitted the result it should not run this check, and avoid alerting people.
The potential issue with both of these is that freshness goes from the last time a check was submitted, so it may get somewhat offset depending on when the passive results were last passed in. You may run an hourly freshness check instead, and use a helper script to call check dummy if it is within your timeframe and the HD person has not submitted within the timeframe or depending on the last host state.
Nagios Macros - Specifically you might want to pass a helper script the $LASTSERVICESTATEID$, $LASTSERVICESTATE$, $SERVICEDURATIONSEC$, or $SERVICEDURATION$ depending on how you would like to do various logic.
Configuring passive checks within XI - Specifically look towards the bottom regarding freshness.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.