unexpected Reboot not registered in Nagios

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
lpereira
Posts: 143
Joined: Thu Jul 27, 2017 4:23 pm

unexpected Reboot not registered in Nagios

Post by lpereira »

hello all:
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.

All the duration for the services did not changed (all says 4 days).
bdp01 general.jpg

The strange thing is, uptime duration says 4 days, but the status information says 7 hours. also the graph shows a drop in the uptime
uptime.jpg
My question is... why we didn't recevied any alert for this?

I have attached 2 images for the services and uptime graph
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: unexpected Reboot not registered in Nagios

Post by lmiltchev »

today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
Not receiving a notification could be caused by many things...

First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?

How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
lpereira
Posts: 143
Joined: Thu Jul 27, 2017 4:23 pm

Re: unexpected Reboot not registered in Nagios

Post by lpereira »

lmiltchev wrote:
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
Not receiving a notification could be caused by many things...

First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?

How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.
reports does not show nothing.

Code: Select all

###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.10
# Date:	      2018-01-23 19:53:25
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios CCM will overwrite all manual settings during the next update if you 
# would like to edit files manually, place them in the 'static' directory or 
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define host {
	host_name			HOST
	use				xiwizard_windowsserver_host
	address				HOSTIP
	max_check_attempts		3
	check_interval			10
	retry_interval			5
	check_period			xi_timeperiod_24x7
	contacts			nagiosadmin
	notification_interval		120
	notification_period		xi_timeperiod_24x7
	icon_image			win_server.png
	statusmap_image			win_server.png
	_xiwizard			windowsserver
	register			1
	}	

###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################
Does the uptime send alerts if the state changes?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: unexpected Reboot not registered in Nagios

Post by lmiltchev »

reports does not show nothing.
If the state history report doesn't show a state change, this means that the service was down for a VERY short period of time (between two checks), and this didn't get caught by nagios.

How often do you run the host check? The "default" check interval is usually 5 min. If the server was down for example for 1-2 min in between two checks, this would not show up in the state history report. If the server hasn't been down long enough for the last retry check to be performed (max_check_attempts), then it would not change to "hard" non-ok state, and a notification wouldn't be sent.

There are some conditions/filters to be passed in order for a notification to be sent, e.g. program-wide, service & host, and contact filters. Read more about these filters here:

https://assets.nagios.com/downloads/nag ... tions.html
Does the uptime send alerts if the state changes?
The state needs to change to a "hard" non-ok state in order for the notifications to go out. This is valid not only of the uptime check, but for any other active checks. Read more about notifications here:
https://assets.nagios.com/downloads/nag ... tions.html
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked