Page 1 of 2

Nagios sending notifications to Slack outside of workhours

Posted: Wed Jan 29, 2020 3:46 pm
by IT-Biscuit
Hello.

I am receiving slack notifications outside of workhours when I have intentionally set nagios only to send notifications during work hours.
I have configured my contact notification period to alert during 'workhours' which is configured as such:

Code: Select all

# 'workhours' timeperiod definition
define timeperiod{
        timeperiod_name workhours
        alias           Normal Work Hours

        use             us-holidays             ; Get holiday exceptions from other timeperiod

        monday          07:00-19:00
        tuesday         07:00-19:00
        wednesday       07:00-19:00
        thursday        07:00-19:00
        friday          07:00-19:00
        }
I have also included us-holidays

Code: Select all

# Some U.S. holidays
# Note: The timeranges for each holiday are meant to *exclude* the holidays from being
# treated as a valid time for notifications, etc.  You probably don't want your pager 
# going off on New Year's.  Although you're employer might... :-)
define timeperiod{
        name                    us-holidays
        timeperiod_name         us-holidays
        alias                   U.S. Holidays

        january 1               00:00-00:00     ; New Years
        monday -1 may           00:00-00:00     ; Memorial Day (last Monday in May)
        july 4                  00:00-00:00     ; Independence Day
        monday 1 september      00:00-00:00     ; Labor Day (first Monday in September)
        thursday 4 november     00:00-00:00     ; Thanksgiving (4th Thursday in November)
        december 25             00:00-00:00     ; Christmas
        }
For my contact group defined under objects/contacts.cfg:

Code: Select all

###############################################################################



###############################################################################
###############################################################################
#
# CONTACTS
#
###############################################################################
###############################################################################

# Just one contact defined by default - the Nagios admin (that's you)
# This contact definition inherits a lot of default values from the 'generic-contact' 
# template which is defined elsewhere.

define contact{
        contact_name                    nagiosadmin             ; Short name of user
        use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
        alias                           Nagios Admin            ; Full name of user

        email                           nagios@localhost        ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
        }



###############################################################################
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################
###############################################################################

# We only have one contact in this simple configuration file, so there is
# no need to create more than one contact group.

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 nagiosadmin,slack
        }

generic-contact is defined in objects/templates.cfg:

Code: Select all


###############################################################################
# TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES
#
#
# NOTES: This config file provides you with some example object definition
#        templates that are refered by other host, service, contact, etc.
#        definitions in other config files.
#       
#        You don't need to keep these definitions in a separate file from your
#        other object definitions.  This has been done just to make things
#        easier to understand.
#
###############################################################################



###############################################################################
###############################################################################
#
# CONTACT TEMPLATES
#
###############################################################################
###############################################################################

# Generic contact definition template - This is NOT a real contact, just a template!

define contact{
        name                            generic-contact    	; The name of this contact template
        host_notifications_enabled      1
        service_notifications_enabled   1
        service_notification_period     workhours		; service notifications will be sent during workhours
        host_notification_period        workhours		; host notifications will be sent during workhours
        service_notification_options    w,u,c,r,f,s		; send notifications for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s		; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email	; send service notifications via email
        host_notification_commands      notify-host-by-email	; send host notifications via email
        register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }




###############################################################################
###############################################################################
#
# HOST TEMPLATES
#
###############################################################################
###############################################################################

# Generic host definition template - This is NOT a real host, just a template!

define host{
        name                            generic-host    ; The name of this host template
	check_period			24x7		; Checks wlll be performed 24x7
        notifications_enabled           1       	; Host notifications are enabled
        event_handler_enabled           1       	; Host event handler is enabled
        flap_detection_enabled          1       	; Flap detection is enabled
        process_perf_data               1       	; Process performance data
        retain_status_information       1       	; Retain status information across program restarts
        retain_nonstatus_information    1       	; Retain non-status information across program restarts
	notification_period		workhours	; Send host notifications during workhours
        register                        0       	; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }


# Linux host definition template - This is NOT a real host, just a template!

define host{
	name				linux-server	; The name of this host template
	use				generic-host	; This template inherits other values from the generic-host template
	check_period			24x7		; By default, Linux hosts are checked round the clock
	check_interval			5		; Actively check the host every 5 minutes
	retry_interval			1		; Schedule host check retries at 1 minute intervals
	max_check_attempts		10		; Check each Linux host 10 times (max)
        check_command           	check-host-alive ; Default command to check Linux hosts
	notification_period		workhours	; Linux admins hate to be woken up, so we only notify during the day
							; Note that the notification_period variable is being overridden from
							; the value that is inherited from the generic-host template!
	notification_interval		120		; Resend notifications every 2 hours
	notification_options		d,u,r		; Only send notifications for specific host states
	contact_groups			admins		; Notifications get sent to the admins by default
	register			0		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
	}



# Windows host definition template - This is NOT a real host, just a template!

define host{
	name			windows-server	; The name of this host template
	use			generic-host	; Inherit default values from the generic-host template
	check_period		24x7		; By default, Windows servers are monitored round the clock
	check_interval		5		; Actively check the server every 5 minutes
	retry_interval		1		; Schedule host check retries at 1 minute intervals
	max_check_attempts	10		; Check each server 10 times (max)
	check_command		check-host-alive	; Default command to check if servers are "alive"
	notification_period	workhours	; Send notification out during workhours
	notification_interval	30		; Resend notifications every 30 minutes
	notification_options	d,r		; Only send notifications for specific host states
	contact_groups		admins		; Notifications get sent to the admins by default
	hostgroups		windows-servers ; Host groups that Windows servers should be a member of
	register		0		; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}


# We define a generic printer template that can be used for most printers we monitor

define host{
	name			generic-printer	; The name of this host template
	use			generic-host	; Inherit default values from the generic-host template
	check_period		24x7		; By default, printers are monitored round the clock
	check_interval		5		; Actively check the printer every 5 minutes
	retry_interval		1		; Schedule host check retries at 1 minute intervals
	max_check_attempts	10		; Check each printer 10 times (max)
	check_command		check-host-alive	; Default command to check if printers are "alive"
	notification_period	workhours		; Printers are only used during the workday
	notification_interval	30		; Resend notifications every 30 minutes
	notification_options	d,r		; Only send notifications for specific host states
	contact_groups		admins		; Notifications get sent to the admins by default
	register		0		; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}


# Define a template for switches that we can reuse
define host{
	name			generic-switch	; The name of this host template
	use			generic-host	; Inherit default values from the generic-host template
	check_period		24x7		; By default, switches are monitored round the clock
	check_interval		5		; Switches are checked every 5 minutes
	retry_interval		1		; Schedule host check retries at 1 minute intervals
	max_check_attempts	10		; Check each switch 10 times (max)
	check_command		check-host-alive	; Default command to check if routers are "alive"
	notification_period	workhours		; Send notifications at any time
	notification_interval	30		; Resend notifications every 30 minutes
	notification_options	d,r		; Only send notifications for specific host states
	contact_groups		admins		; Notifications get sent to the admins by default
	register		0		; DONT REGISTER THIS - ITS JUST A TEMPLATE
	}




###############################################################################
###############################################################################
#
# SERVICE TEMPLATES
#
###############################################################################
###############################################################################

# Generic service definition template - This is NOT a real service, just a template!

define service{
        name                            generic-service 	; The 'name' of this service template
        active_checks_enabled           1       		; Active service checks are enabled
        passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
        parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       		; We should obsess over this service (if necessary)
        check_freshness                 0       		; Default is to NOT check service 'freshness'
        notifications_enabled           1       		; Service notifications are enabled
        event_handler_enabled           1       		; Service event handler is enabled
        flap_detection_enabled          1       		; Flap detection is enabled
        process_perf_data               1       		; Process performance data
        retain_status_information       1       		; Retain status information across program restarts
        retain_nonstatus_information    1       		; Retain non-status information across program restarts
        is_volatile                     0       		; The service is not volatile
        check_period                    24x7			; The service can be checked at any time of the day
        max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10			; Check the service every 10 minutes under normal conditions
        retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
	notification_options		w,u,c,r			; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60			; Re-notify about service problems every hour
        notification_period             workhours		; Notifications can be sent out at any time
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


# Local service definition template - This is NOT a real service, just a template!

define service{
	name				local-service 		; The name of this service template
	use				generic-service		; Inherit default values from the generic-service definition
        max_check_attempts              4			; Re-check the service up to 4 times in order to determine its final (hard) state
        normal_check_interval           5			; Check the service every 5 minutes under normal conditions
        retry_check_interval            1			; Re-check the service every minute until a hard state can be determined
        register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
	}
The contact slack is defined in slack.cfg:

Code: Select all

define contact {
  contact_name                             slack
  alias                                    Slack
  host_notifications_enabled      1
  service_notifications_enabled   1
  service_notification_period              workhours
  host_notification_period                 workhours
  service_notification_options             c,r
  host_notification_options                d,r
  service_notification_commands            notify-service-by-slack
  host_notification_commands               notify-host-by-slack
}

define contactgroup {
  contactgroup_name systems
  alias Systems
  members slack
}

# if you're interested in writing new checks, here's the list of macros available from nagios
# https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/macrolist.html

define command {
  command_name  notify-service-by-slack
  command_line  /usr/local/bin/slack_nagios_service.sh $SERVICESTATE$ $HOSTNAME$ "$SERVICEDESC$" "$LONGSERVICEOUTPUT$" "$SERVICEOUTPUT$"
}

define command {
  command_name  notify-host-by-slack
  command_line  /usr/local/bin/slack_nagios_host.sh $HOSTSTATE$ $HOSTNAME$
}
Any idea as to whats wrong?

Re: Nagios sending notifications to Slack outside of workhou

Posted: Wed Jan 29, 2020 5:53 pm
by Box293
Can you have a look in your objects.cache file (maybe /usr/local/nagios/var/). In this file is the combination of all your templates / configs that make up the final config. Can you see if the timeperiod definition is correct along with the other objects.

Re: Nagios sending notifications to Slack outside of workhou

Posted: Fri Jan 31, 2020 1:23 pm
by IT-Biscuit
Hi.

The time periods seem to be correct but still getting alerts well outside of work hours.

I added the objects.cache file as an attachment to the post. It's too large to post the code in one post.

Note that I also edited to remove some confidential information.

Re: Nagios sending notifications to Slack outside of workhou

Posted: Mon Feb 03, 2020 12:41 pm
by IT-Biscuit
Any other ideas?

Re: Nagios sending notifications to Slack outside of workhou

Posted: Tue Feb 04, 2020 6:03 pm
by Box293
Configs look OK.

What is your version of Nagios?

You may have multiple nagios processes running.

Can you please follow this guide, under the section "Check For Multiple Nagios Processes":

https://support.nagios.com/kb/article.php?id=19

Are you able to send a custom notification out of work hours and does this send the alert? If you can reproduce it this way then I would enable debug logging and see what is logged.

Try setting the debug level on and then restart Nagios.

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=-1/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
This is logged in the file /usr/local/nagios/var/nagios.debug

When you are finished this turns debugging off:

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=0/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart

Re: Nagios sending notifications to Slack outside of workhou

Posted: Thu Feb 06, 2020 5:32 pm
by IT-Biscuit
Processes seem okay:

Code: Select all

~$ ps -ef | grep nagios.cfg | grep -v grep
nagios   14145     1  0 Jan27 ?        00:09:54 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   14151 14145  0 Jan27 ?        00:01:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
So here is something odd, I haven't enabled debugging yet, however; I did attempt to send a test notification out right now during work hours.

The first attempt was not received by slack.
My second attempt, I selected the 'forced' option which will ignore all time related rules such as workhours or non-workhours ect. My second attempt succeeded with the forced option selected. The time under the Nagios server is under GMT and seems to be correct however.

I am going to enable debugging and see what comes of that...

Re: Nagios sending notifications to Slack outside of workhou

Posted: Thu Feb 06, 2020 5:44 pm
by Box293
OK great let us know what information that produces.

Re: Nagios sending notifications to Slack outside of workhou

Posted: Thu Feb 06, 2020 5:56 pm
by IT-Biscuit
Doing a quick test on the debugging - incase I'm missing something.

The debugging didn't seem to capture any of the notifications I sent out regardless if they were successful or unsuccessful... I cleared the debug log and tried again but just have an empty debug log currently.

This is what I had before clearing it:
Debug.log
(483.06 KiB) Downloaded 194 times

Re: Nagios sending notifications to Slack outside of workhou

Posted: Thu Feb 06, 2020 7:13 pm
by Box293
What is the name of the host and service that should be appearing in this log that shows the service entering a warn/crit state?

You may need to increase the debug log size if the logs are being rotated too quickly.

Re: Nagios sending notifications to Slack outside of workhou

Posted: Mon Feb 17, 2020 12:35 pm
by IT-Biscuit
I figured out what it was >_> ... the Server's timezone was misconfigured so it was sending notifications at the right hours, but the wrong timezone.