Understanding notifications.....

sandsdenver · Post by **sandsdenver** » Sat Jul 11, 2015 10:02 pm

I am relatively new to Nagios, but have a working system to learn from. It has over 30000+ devices. Person in charge of the Nagios system departed, until a replacement is found I am attempting to fill the gaps....version is Nagios® Core™ Version 4.0.8

All I want is a when one device goes down hard, that an notification be sent to one particular
I have defined the person in the contact.cfg file.

And I have found the service that triggers the event, called devPing (I think).

I have also found the host.
define host {
# Site: ViaWest DataCenter - CIS Cage
use critical-router-XDS2 ; From: Host: XDS2-RTR-01
host_name XDS2-RTR-01
alias XDS2-DataSideCIS (XDS2-RTR-01)
display_name XDS2-DataSideCIS (XDS2-RTR-01)
address 10.X.X.X
;_device_id 4411
_community_string XXXXXXXXXX
_site_code XDS2
;_type Router.Ent
notes_url http://ns-cacti-1.ad.abcs.net/tools/vie ... DS2-RTR-01
_site_alias XDS2-DataSideCIS
parents XDS1-RTR-01, XDS2-RTR-02
hostgroups +Connection.Other, DataCenter.DataSideCIS, gLink.10_X_X_X-XDS2-XDS2-v, HW.Foundry.NetIron MLXe-8, Router.Data
Router.Ent, Site.XDS1, Site.XDS2
}

What I can not find is what I believe triggers the event, check_command? We do have host groups, and when this device goes down we do get an email to the admins. SO I know that part works. We just have so many .cfg files and the directory structure is huge, with over 1,600,000 service checks a day, I am getting mired down.

Post by **Box293** » Mon Jul 13, 2015 1:42 am

So this line here:

use critical-router-XDS2 ; From: Host: XDS2-RTR-01

the "use" means use a template. That template is critical-router-XDS2.

The check_command must be defined in this template. Keep in mind that template can also have a "use" referencing another template so you may need to look through the chain to find the check_command.

The check_command will reference a command definition which is used to construct the command that is executed at the command prompt (a plugin is executed). The "exit code" returned by the plugin determine Up/Down/Ok/Warning/Critical. Different options for the plugin allow you to define different thresholds. Once a host enters a hard state then notifications will be sent to the contacts.

This host definition does not have any contacts or contactgroups assigned so they must too be assigned through the template.

Let us know if this helps, post some more configs of the templates being used and the check_command definition and we should be able to get you versed in how Nagios works.

sandsdenver · Post by **sandsdenver** » Tue Jul 14, 2015 9:34 am

It helped immensely. Thank you very much for even the simplest explanations you gave. This weekend I poured over our config files and SO to follow the chain, the Host xds2-rtr-01 is using criticial-router-xdc2

define host {
# Site: ViaWest DataCenter - CIS Cage
use critical-router-XDS2 ; From: Host: XDS2-RTR-01
host_name XDS2-RTR-01
alias XDS2-DataSideCIS (XDS2-RTR-01)
display_name XDS2-DataSideCIS (XDS2-RTR-01)
address 10.X.X.X
;_device_id 4411
_community_string XXXXXXXXXX
_site_code XDS2
;_type Router.Ent
notes_url http://ns-cacti-1.ad.abcs.net/tools/vie ... DS2-RTR-01
_site_alias XDS2-DataSideCIS
parents XDS1-RTR-01, XDS2-RTR-02
hostgroups +Connection.Other, DataCenter.DataSideCIS, gLink.10_X_X_X-XDS2-XDS2-v, HW.Foundry.NetIron MLXe-8, Router.Data
Router.Ent, Site.XDS1, Site.XDS2
}

find host critical router xds2

define host{
name critical-router-XDS2
use critical-router
hostgroups HostTemplate.critical-router-XDS2,HostPriority.critical.24x7.network
contact_groups +CIS-sysadmins
flap_detection_enabled 0 ; DISABLE flap detection for this template
register 0
}

now find critical router:

define host{
name critical-router ;host template
use ABCD-generic-host
hostgroups HostTemplate.critical-router,HostPriority.critical.24x7.network
check_period 24x7
notification_period 24x7
notification_options d,r
notification_interval 120 ; 2 hour renotification
contact_groups admins,NOC,ABCDNS-Network,ABCDNS-Network-pager-24x7
check_interval 1
max_check_attempts 3
retry_interval 0.25
flap_detection_enabled 0 ; DISABLE flap detection for critical-router
register 0
}

uses generic host

define host{
name ABCD-generic-host ;host template
use dist-hostset01-T,generic-host ; use this template to determine active/passive check status (centralized check by default), then inherit generic-host##check_command ##check-host-alive
##check_command ##ABCD-check-host-alive
check_command ABCD-check-host-alive-icmp
}

Here is the check:
# custom 'check-host-alive' command definition using check_icmp
# for nagios to use the check_icmp plugin, setuid must be enabled
define command {
command_name ABCD-check-host-alive-icmp
##command_line $USER1$/check_icmp -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100%
command_line $USER1$/check_icmp -H $HOSTADDRESS$ -w 3000.0,48% -c 5000.0,49% -n 10
}

Now, that check_command, if that fails (3 pings that do reply), down hard, I want it to notify a contact I have made in the contact file

define contact{
contact_name ERwithBrocade-PingsFail-24x7
alias ERwithBrocade
service_notification_period 24x7
host_notification_period 24x7
service_notification_options u,w,c,r,s,f ;; unknown,warning,critical,recovery,flap,scheduled (n,w,u,c,r,s) ;;none (n)
host_notification_options d,r,f,u,s ;; down,recover,flap,unreachable,scheduled (n,d,r,f,u,s) ;;none (n)
service_notification_commands ABCD-svc-notify-by-email
host_notification_commands ABCD-host-notify-by-email
email [email protected]
}

So the grand finale....would I now define a new service or host? I just want this one host, when it doesnt repsond to pings, to notify the contact.

This is where I am stuck, I do I create another host or service? I am thinking service with the check_command ABCD-check-host-alive-icmp.

Thanks again for any input

jdalrymple · Post by **jdalrymple** » Tue Jul 14, 2015 10:38 am

You're on the right path.

Every host has an "implied service" - You can have a zillion hosts and no services if you only want to know if the host is up or down. The host check_command is entirely intended to verify the up/downness of the actual server/router/whatever.

So with that, in order to achieve your goal - you're pretty much done. No need to define a service at all. If your contact isn't in the contactgroup admins, NOC,ABCDNS-Network, ABCDNS-Network-pager-24x7 or CIS-sysadmins you'll either want to explicitly place it in a host definition you deem appropriate.

Nagios Support Forum

Understanding notifications.....

Understanding notifications.....

Re: Understanding notifications.....

Re: Understanding notifications.....

Re: Understanding notifications.....