Page 1 of 2

Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 10:29 am
by Bitflogger
doitnumber_problem.png
Hello,

I'm running v5.6.7, on a 64-bit CentOS 7 VM.

I have a previous case titled "Recovery does not send notification".

I opened a github case on that: https://github.com/NagiosEnterprises/na ... issues/651

It looks like they think it was fixed in Nagios core 4.4.4, and closed the case.

According to the change log for v5.6.7, you are using Nagios core 4.4.5.

I have a new occurrence of the same problem. I tried submitting a passive "OK" result, but the monitor stayed soft.

Earl

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 10:41 am
by scottwilkerson
If you are in a SOFT CRITICAL and send the OK it will be a soft recovery.

This can happen in 2 cases, one if you haven't reached a HARD state yet, OR if the HARD CRITICAL was for example 1/3 because the host was down.

This is normal and expected behavior.

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 11:39 am
by Bitflogger
doitnumber_problem2.png
Hello,

There was a critical e-mail, no recovery e-mail.

Earl

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 11:49 am
by scottwilkerson
What happened to the host after 1:24? This is important to understand be

Can you share the host and service definitions along with any underlying templates? (obfuscating any sensitive info)

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 2:21 pm
by Bitflogger
doitnumber_problem3.png
I'm unsure of the first question, the server shows green for the service when I did the screen prints. You can see the duration in the attached screen print.

###############################################################################
#
# Hosts configuration file
#
# Created by: Nagios CCM 2.7.2
# Date: 2018-11-02 15:37:37
# Version: Nagios Core 4.x
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define host {
host_name x*****.doit.wisc.edu
use windows-server
address x****.doit.wisc.edu
max_check_attempts 1
check_interval 5
retry_interval 1
active_checks_enabled 1
contact_groups se-windows,sm_alerts_monitoring
notification_period win_wed_0300
register 1
}

###############################################################################
#
# Hosts configuration file
#
# END OF FILE
#
###############################################################################

###############################################################################
#
# Services configuration file
#
# Created by: Nagios CCM 3.0.3
# Date: 2019-10-21 17:05:04
# Version: Nagios Core 4.x
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
host_name x*****.doit.wisc.edu
service_description w_vugen_doitnumber
use generic-vugen-service
servicegroups _vugen_messages
check_command check_nscp_vugen!read_vugen -a s=$_SERVICESCRIPT$!!!!!!!
check_interval 5
notification_interval 5760
notification_period win_wed_0300
notification_options w,c,r,
contact_groups 93fe6121d79bee3ee5e3ee4f55863056d0530bcded,ovo-ss-vugen
_alert_description Check doitnumber web page
_ci (ci:93fe6121d79bee3ee5e3ee4f55863056d0530bcded) (cdwh)
_script doitnumber
_support_info https://a***.fina****.doit.wisc.edu
register 1
}

define host {
name windows-server
hostgroups _windows
check_command check-host-alive!!!!!!!!
use generic-host
max_check_attempts 10
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
contact_groups se-windows
notification_period 24x7
notification_options d,r,s,
notifications_enabled 1
register 0
}

define host {
name generic-host
check_command check-host-alive!!!!!!!!
max_check_attempts 3
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
event_handler_enabled 1
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 120
notification_period 24x7
notification_options d,u,r,s,
notifications_enabled 1
register 0
}

define service {
name generic-vugen-service
use generic-service
max_check_attempts 3
retry_interval 5
process_perf_data 1
notification_interval 5760
notification_period 24x7
notification_options w,c,r,
notifications_enabled 1
register 0
}

define service {
name generic-service
service_description generic-service
is_volatile 0
max_check_attempts 3
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
parallelize_check 1
obsess_over_service 1
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 0
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
notification_options w,c,u,r,
notifications_enabled 1
register 0
}

define timeperiod {
timeperiod_name win_wed_0300
alias Automatic Updates - Wednesday - 03:00
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-02:55,03:40-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

define timeperiod {
timeperiod_name 24x7
alias 24 Hours A Day,7 Days A Week
saturday 00:00-24:00
friday 00:00-24:00
thursday 00:00-24:00
wednesday 00:00-24:00
tuesday 00:00-24:00
monday 00:00-24:00
sunday 00:00-24:00
}

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 2:36 pm
by scottwilkerson
Bitflogger wrote:I'm unsure of the first question, the server shows green for the service when I did the screen prints. You can see the duration in the attached screen print.
I was referring to the host for this service, if you look at the state history for the host, did the host go down at all?

Re: Recovery does not send notification, part 2

Posted: Wed Nov 06, 2019 3:25 pm
by Bitflogger
Hello,

We are checking a web page, using an intermediate server that makes many similar checks of other web pages.

We had a firewall fail-over and reset last night.

The Nagios XI server has been up 29 days:

uptime
14:21:59 up 29 days, 10:09, 6 users, load average: 1.46, 1.58, 1.64

Earl

Re: Recovery does not send notification, part 2

Posted: Thu Nov 07, 2019 7:32 am
by scottwilkerson
scottwilkerson wrote:I was referring to the host for this service, if you look at the state history for the host, did the host go down at all?
If you look at the state history for x*****.doit.wisc.edu, did the host go down at all during this time?

Re: Recovery does not send notification, part 2

Posted: Fri Nov 08, 2019 1:59 pm
by Bitflogger
Hello,

I don't see the significance of that, since the state change is on the NagiosXI server.

The x**** server uptime probe reports: uptime: 4w 2d 10:8h, boot: 2019-10-09 08:38:34 (UTC)

Earl

Re: Recovery does not send notification, part 2

Posted: Fri Nov 08, 2019 2:09 pm
by scottwilkerson
Bitflogger wrote:I don't see the significance of that, since the state change is on the NagiosXI server.
The significance is that Nagios doesn't sent notifications for services when a host is in a non-UP state.