Using Nagios to monitor completion of daily Helpdesk tasks
Posted: Thu Jun 20, 2013 4:42 am
Hi.
I'm trying to setup in Nagios, a means to monitor the completion of some Helpdesk checks that are carried out each day manually. The HD member completes the check, whatever it may be, and submits a passive check in Nagios to record it as being complete.
Could somebody advise if this is the best way to do it:
1. Hosts are the days of the week, with active checks disabled and a passive check submitted once when created to mark it as OK and UP.
2. Services on each host are the daily checks that need to be done that day. These have active checks and passive checks enabled.
3. The services check command is a simple script I wrote that checks to see if the current day of the week matches the day of the check, if it does it returns CRITICAL, if not then OK.
4. Check period is 23:45-24:00 the previous day and 00:00-00:15 the current day, meaning the checks start at quarter to midnight and return OK, when the check runs after midnight it gives us the CRITICAL, so when the Helpdesk arrive in the morning the required tasks are already set to CRITICAL.
5. Passive service check is submitted by HD staff when the check has been done and it then remains OK untill the following week.
This works. The tricky part is getting the notifications right. This is what I want from them:
1. Delayed notification is sent to the Helpdesk at 08:30, giving them 30 minutes to do the checks before getting nagged.
2. If still CRITICAL @ 13:00, send email to HD Manager. I'm having trouble with this in the fact that, to use escalations it must be generating more than one notification, which by design this isn't, because each time the active check is run, it would change state to CRITICAL and override the passive OK that might have been submitted, so in my mind this just wouldnt work.
I've set a a ''first notification delay" option of 510 minutes. In my mind that is 8.5 hours from the last known OK state, which would be midnight or shortly before, meaning an email sent at around 08:30. This doesn't work, infact it's not sending ANY notifications at all. Have I got the logic wrong here?
So to clarify, I'm after help with the notification delays, the escalations (if even possible!?) and generally wether I'm doing this kind of scenario in the correct manner. If anybody has any better ideas or suggestions, I would be so grateful to you!
Many thanks,
Malcolm
I'm trying to setup in Nagios, a means to monitor the completion of some Helpdesk checks that are carried out each day manually. The HD member completes the check, whatever it may be, and submits a passive check in Nagios to record it as being complete.
Could somebody advise if this is the best way to do it:
1. Hosts are the days of the week, with active checks disabled and a passive check submitted once when created to mark it as OK and UP.
2. Services on each host are the daily checks that need to be done that day. These have active checks and passive checks enabled.
3. The services check command is a simple script I wrote that checks to see if the current day of the week matches the day of the check, if it does it returns CRITICAL, if not then OK.
4. Check period is 23:45-24:00 the previous day and 00:00-00:15 the current day, meaning the checks start at quarter to midnight and return OK, when the check runs after midnight it gives us the CRITICAL, so when the Helpdesk arrive in the morning the required tasks are already set to CRITICAL.
5. Passive service check is submitted by HD staff when the check has been done and it then remains OK untill the following week.
This works. The tricky part is getting the notifications right. This is what I want from them:
1. Delayed notification is sent to the Helpdesk at 08:30, giving them 30 minutes to do the checks before getting nagged.
2. If still CRITICAL @ 13:00, send email to HD Manager. I'm having trouble with this in the fact that, to use escalations it must be generating more than one notification, which by design this isn't, because each time the active check is run, it would change state to CRITICAL and override the passive OK that might have been submitted, so in my mind this just wouldnt work.
I've set a a ''first notification delay" option of 510 minutes. In my mind that is 8.5 hours from the last known OK state, which would be midnight or shortly before, meaning an email sent at around 08:30. This doesn't work, infact it's not sending ANY notifications at all. Have I got the logic wrong here?
So to clarify, I'm after help with the notification delays, the escalations (if even possible!?) and generally wether I'm doing this kind of scenario in the correct manner. If anybody has any better ideas or suggestions, I would be so grateful to you!
Many thanks,
Malcolm