Page 1 of 1
Email on server reboot
Posted: Wed Apr 23, 2014 5:03 am
by stebbo
Hi All,
I would like to receive an email when a server reboots unexpectedly. I've read the thread
http://support.nagios.com/forum/viewtop ... 16&t=24936 but didn't understand the solution that pacmag implemented so am hoping for some guidance there or other ideas.
I am mostly monitoring windows servers and am monitoring the uptime variable on those servers. I was hoping it would be relatively easy to either
a) receive a notification if the returned uptime is lower than a previous return value, or
b) receive a notification if the returned uptime is lower than x minutes.
Often a reboot (due to windows updates or an administrator trying the perennial "a reboot will fix it") can be achieved in far less time than the 5 minute checks and so happens without my knowledge.
Cheers,
Chris.
Re: Email on server reboot
Posted: Wed Apr 23, 2014 9:52 am
by sreinhardt
Do you have a preferred plugin or preferred way of monitoring (agent, wmi, snmp) those systems? There definitely are ways of doing this, but if you are already monitoring in one way or another its probably best to work with that existing setup, rather than send you down a different path.
Re: Email on server reboot
Posted: Wed Apr 23, 2014 4:58 pm
by stebbo
Hi Spenser,
I am using nsclient++ on these windows servers, generally 0.4.1, and using check_nt for the command to retrieve the uptime data.
I only do this because this is what the server wizard sets up - if there's an easier or better way I'm happy to set it up differently.
Cheers,
Chris.
Re: Email on server reboot
Posted: Wed Apr 23, 2014 9:52 pm
by Box293
Hi Chris,
You can do what you're saying relatively easily. However using the check_nt command for uptime is not going to work because it doesn't trigger warning or critical thresholds.
Howerver all is not lost. Instead you can use the check_nrpe command to query your Windows servers for the System Up Time performance counter (still using NSClient++) and then trigger alerts based on those thresholds.
From your Nagios host command line:
Code: Select all
check_nrpe -H <windows_server> -c CheckCounter -a "Counter=\System\System Up Time" ShowAll MinCrit=600
Which should respond with:
Code: Select all
OK: \System\System Up Time: 8921.99|'System Up Time'=8921.994197;0;600;
The number it returns is in seconds, so my system has been up for 148 minutes.
The MinCrit value of 600 equals 10 minutes. So when this check executes and the uptime is less than 10 minutes, it will trigger a critical status. Once the uptime is past 10 minutes the service will return to an OK state.
So when the service enters a critical state, whoever is a contact for that service will receive an alert (hence an email).
FYI when you setup your services for uptime, the performance counter will require double backslashes ...
Let me know how you go.
Troy
Re: Email on server reboot
Posted: Thu Apr 24, 2014 1:51 am
by WillemDH
Useful info Troy! Gonna try this too some day.

Re: Email on server reboot
Posted: Thu Apr 24, 2014 9:34 am
by sreinhardt
Agreed, good post Troy! I'll have to make a note to add warning and critical values to that portion of check_nt.. unless we did that with plugins 2.0, I forget..

Re: Email on server reboot
Posted: Thu Apr 24, 2014 7:40 pm
by Box293
Cheers
I'm a big fan of performance counters ...
Re: Email on server reboot
Posted: Fri Apr 25, 2014 2:59 am
by stebbo
Hi Troy,
thanks for the response. I tried the command as you entered above from the command line and initially received an error
Code: Select all
Request contained arguments (not currently allowed, check the allow arguments option).
I searched the forum and found some advice to add a line to the ini file
which fixed that problem.
Then I got an illegal metacharacter error, so added
and all appears well. I shall add these services to this host and see how it goes over time.
Is there any down-side to allowing the arguments / nasty characters?
Thanks again,
Chris.
Re: Email on server reboot
Posted: Fri Apr 25, 2014 9:10 am
by tmcdonald
Arguments on their own are fine, but if you allow metacharacters it can be a security risk. We put those characters in place to keep people from chaining commands and potentially compromising a system. For example, you could have check_nrpe call something like:
check_disk -w 20% -c 30% && cat /etc/passwd
for example, which is why we disallow the ampersand.
Re: Email on server reboot
Posted: Mon May 05, 2014 12:46 am
by stebbo
Thanks for all the help everyone. This seems to be working quite well. I especially like that I can graph the uptime - the graphs don't look particularly nice but it keeps a history of the reboots which is a bonus.