hello all:
today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
All the duration for the services did not changed (all says 4 days).
The strange thing is, uptime duration says 4 days, but the status information says 7 hours. also the graph shows a drop in the uptime
My question is... why we didn't recevied any alert for this?
I have attached 2 images for the services and uptime graph
unexpected Reboot not registered in Nagios
unexpected Reboot not registered in Nagios
You do not have the required permissions to view the files attached to this post.
Re: unexpected Reboot not registered in Nagios
Not receiving a notification could be caused by many things...today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?
How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: unexpected Reboot not registered in Nagios
reports does not show nothing.lmiltchev wrote:Not receiving a notification could be caused by many things...today a Server suffer an unexpected reboot due a bluescreen kernel failure. the server back online almost inmediately. We do not recevied any alert for this.
First off, did the host check go to a hard down state? You said the server came back online almost immediately. Maybe, it didn't have time to go to a hard state... Can you go to Reports > State History, select your host from the "Limit To" drop-down menu, click on "Run", and show a screenshot of the page?
How is the host configured? Can you show us the configuration file? You can view it by going to the CCM > Hosts, then clicking on the "View Config" icon next to it.
Code: Select all
###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.10
# Date: 2018-01-23 19:53:25
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################
define host {
host_name HOST
use xiwizard_windowsserver_host
address HOSTIP
max_check_attempts 3
check_interval 10
retry_interval 5
check_period xi_timeperiod_24x7
contacts nagiosadmin
notification_interval 120
notification_period xi_timeperiod_24x7
icon_image win_server.png
statusmap_image win_server.png
_xiwizard windowsserver
register 1
}
###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################Re: unexpected Reboot not registered in Nagios
If the state history report doesn't show a state change, this means that the service was down for a VERY short period of time (between two checks), and this didn't get caught by nagios.reports does not show nothing.
How often do you run the host check? The "default" check interval is usually 5 min. If the server was down for example for 1-2 min in between two checks, this would not show up in the state history report. If the server hasn't been down long enough for the last retry check to be performed (max_check_attempts), then it would not change to "hard" non-ok state, and a notification wouldn't be sent.
There are some conditions/filters to be passed in order for a notification to be sent, e.g. program-wide, service & host, and contact filters. Read more about these filters here:
https://assets.nagios.com/downloads/nag ... tions.html
The state needs to change to a "hard" non-ok state in order for the notifications to go out. This is valid not only of the uptime check, but for any other active checks. Read more about notifications here:Does the uptime send alerts if the state changes?
https://assets.nagios.com/downloads/nag ... tions.html
Be sure to check out our Knowledgebase for helpful articles and solutions!