[SOLVED] Notifications not sending on HARD failures

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
nagiosrox
Posts: 2
Joined: Wed Sep 02, 2015 10:04 am

[SOLVED] Notifications not sending on HARD failures

Post by nagiosrox »

Solution at the very bottom


Nnnnoooooooo, Google Chrome crashed as I was hitting submit on this the first time. That sucks. Here it goes again! This time I'm writing this in OneNote and will paste it in :D

Background information: I'm a very experienced Windows sysadmin and network admin. I like to dip my toes in Linux from time to time, but I am not nearly as experienced in it as I am the other aspects of IT. I've setup a RHEL server on AWS to serve as our external monitoring and alert system to alert us when our major network services (email, internet, etc) are down and thus cannot send us alerts. I've configured the latest, as of last week, Nagios Core with Pushover alerts using https://github.com/jedda/OSX-Monitoring ... ushover.sh

Here's what's happening: The PING check sensor has a HARD failure, but no alerts are getting sent out. I've tried with email alerts and pushover alerts, neither work. Neither even show up in the logs, which makes me think that I've got something misconfigured in the hosts.cfg file or contacts.cfg file.

nagios.log :: I've updated the hosts.cfg from a reachable IP to an unreachable IP, restarted the nagios service, and then clicked "Re-schedule the next check of this service". This is the only way I've found to simulate a failure. Please let me know if there's an easier way to simulate failures!

Code: Select all

[1441166679] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;thisisatestthingy;PING;1441166675
[1441166684] SERVICE ALERT: thisisatestthingy;PING;CRITICAL;SOFT;1;CRITICAL - Network Unreachable (<unreachable_IP>)
[1441166749] SERVICE ALERT: thisisatestthingy;PING;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
[1441166799] SERVICE ALERT: thisisatestthingy;PING;CRITICAL;SOFT;3;CRITICAL - Network Unreachable (<unreachable_IP>)
[1441166859] SERVICE ALERT: thisisatestthingy;PING;CRITICAL;HARD;4;CRITICAL - Network Unreachable (<unreachable_IP>)
NOW, if on the Nagios web control panel I click "Send custom service notification" I get the alert to my phone, and the log shows this:

Code: Select all

[1441208968] EXTERNAL COMMAND: SEND_CUSTOM_SVC_NOTIFICATION;thisisatestthingy;PING;0;nagiosadmin;jkl
[1441208968] SERVICE NOTIFICATION: admin1;thisisatestthingy;PING;CUSTOM (CRITICAL);notify-service-by-pushover;CRITICAL - Network Unreachable (<unreachable_IP>);nagiosadmin;jkl

hosts.cfg

Code: Select all

define host{
    use my-template
    address <unreachable_IP>
    host_name thisisatestthingy
    contact_groups admins
}

define service{
        use local-service ; Name of service template to use
        host_name thisisatestthingy
        service_description PING
        check_command check_ping!100.0,20%!500.0,60%
        notifications_enabled 1
        check_period 24x7
        max_check_attempts 4
        normal_check_interval 5
        retry_check_interval 1
        contact_groups admins
        notification_options w,u,c,r
        notification_interval 960
        notification_period 24x7
        flap_detection_enabled 0
        flap_detection_options o,w,c,u
}

templates.cfg

Code: Select all

define host{
        name                            my-template     ; The name of this host template
        notifications_enabled           1               ; Host notifications are enabled
        event_handler_enabled           1               ; Host event handler is enabled
        flap_detection_enabled          1               ; Flap detection is enabled
        process_perf_data               1               ; Process performance data
        retain_status_information       1               ; Retain status information across program restarts
        retain_nonstatus_information    1               ; Retain non-status information across program restarts
        notification_period             24x7            ; Send host notifications at any time
        notification_options            d,r
        contact_groups                  admins
        max_check_attempts              99999
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }

contacts.cfg :: I have tried with email and also with Pushover. Currently Pushover is set as the active alert type.

Code: Select all

define contact{
        contact_name                    admin1
        alias                           admin1 pushover notifications
        service_notification_period     24x7
        host_notification_period        24x7
        host_notification_options       d,u,r,f,s,n
        service_notification_options    w,u,c,r,f,s,n
        service_notification_commands   notify-service-by-pushover
;        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-pushover
;        host_notification_commands      notify-host-by-email
        _pushover_userkey               <pushover_user_key>
        _pushover_appkey                <pushover_app_key>
        _pushover_device                admin1-iphone
        email                           [email protected]
        }

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 admin1
        }

commands.cfg :: contains default email alert string, which I've verified the server can send emails just fine as well. Yes I know that I saved the Pushover script with dashes instead of underscores, but I highly doubt that's the issue -- again the alerts send when you tell it to send a custom notification

Code: Select all

define command{
       command_name notify-host-by-pushover
       command_line /usr/local/nagios/plugins/notify-by-pushover.sh -u "$_CONTACTPUSHOVER_USERKEY$" -a "$_CONTACTPUSHOVER_APPKEY$" -s "spacealarm" -t "$HOSTNAME$ is $HOSTSTATE$" -m "Status: $HOSTOUTPUT$"
       }
define command{
       command_name notify-service-by-pushover
       command_line /usr/local/nagios/plugins/notify-by-pushover.sh -u "$_CONTACTPUSHOVER_USERKEY$" -a "$_CONTACTPUSHOVER_APPKEY$" -s "spacealarm" -t "$SERVICEDESC$ on $HOSTNAME$ is $SERVICESTATE$" -m "Status: $SERVICEOUTPUT$"
       }
I really think that my issue is something to do with my contacts.cfg or hosts.cfg setup but I have been banging my head against the wall for an embarrassing number of hours on this and can't quite get it working.

Thank you for taking the time to read this lengthy email. Hopefully I provided enough information for this question to be answered without me providing anything additional :)

Sincerely,
A Windows admin working out of his element


Issue solved. I needed to remove the "n" flag from service/host_notification_options
Last edited by nagiosrox on Wed Sep 02, 2015 8:49 pm, edited 1 time in total.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Notifications not sending on HARD failures

Post by tmcdonald »

If you go to the Reports -> Notifications page, is anything listed?
Former Nagios employee
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Notifications not sending on HARD failures

Post by tgriep »

In your contact, both the host and service notification options have the letter n set, that means do not send any notifications, remove that and it should fix it for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
nagiosrox
Posts: 2
Joined: Wed Sep 02, 2015 10:04 am

Re: Notifications not sending on HARD failures

Post by nagiosrox »

tgriep wrote:In your contact, both the host and service notification options have the letter n set, that means do not send any notifications, remove that and it should fix it for you.
Horray! You have found what I overlooked!! This fixed the issue :D I knew it was going to be something stupid like that, but couldn't spot it myself. Oh well, it gave me a lot of time to figure out the true processing order and is a good lesson learned!
Locked