Page 1 of 1

unexpected Reboot not registered in Nagios

Posted: Tue Jan 23, 2018 5:21 pm
by lpereira
hello all:
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.

All the duration for the services did not changed (all says 4 days).
bdp01 general.jpg

The strange thing is, uptime duration says 4 days, but the status information says 7 hours. also the graph shows a drop in the uptime
uptime.jpg
My question is... why we didn't recevied any alert for this?

I have attached 2 images for the services and uptime graph

Re: unexpected Reboot not registered in Nagios

Posted: Tue Jan 23, 2018 5:52 pm
by lmiltchev
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
Not receiving a notification could be caused by many things...

First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?

How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.

Re: unexpected Reboot not registered in Nagios

Posted: Tue Jan 23, 2018 5:56 pm
by lpereira
lmiltchev wrote:
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
Not receiving a notification could be caused by many things...

First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?

How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.
reports does not show nothing.

Code: Select all

###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.10
# Date:	      2018-01-23 19:53:25
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios CCM will overwrite all manual settings during the next update if you 
# would like to edit files manually, place them in the 'static' directory or 
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define host {
	host_name			HOST
	use				xiwizard_windowsserver_host
	address				HOSTIP
	max_check_attempts		3
	check_interval			10
	retry_interval			5
	check_period			xi_timeperiod_24x7
	contacts			nagiosadmin
	notification_interval		120
	notification_period		xi_timeperiod_24x7
	icon_image			win_server.png
	statusmap_image			win_server.png
	_xiwizard			windowsserver
	register			1
	}	

###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################
Does the uptime send alerts if the state changes?

Re: unexpected Reboot not registered in Nagios

Posted: Wed Jan 24, 2018 10:23 am
by lmiltchev
reports does not show nothing.
If the state history report doesn't show a state change, this means that the service was down for a VERY short period of time (between two checks), and this didn't get caught by nagios.

How often do you run the host check? The "default" check interval is usually 5 min. If the server was down for example for 1-2 min in between two checks, this would not show up in the state history report. If the server hasn't been down long enough for the last retry check to be performed (max_check_attempts), then it would not change to "hard" non-ok state, and a notification wouldn't be sent.

There are some conditions/filters to be passed in order for a notification to be sent, e.g. program-wide, service & host, and contact filters. Read more about these filters here:

https://assets.nagios.com/downloads/nag ... tions.html
Does the uptime send alerts if the state changes?
The state needs to change to a "hard" non-ok state in order for the notifications to go out. This is valid not only of the uptime check, but for any other active checks. Read more about notifications here:
https://assets.nagios.com/downloads/nag ... tions.html