Page 1 of 2

No notification on host down

Posted: Tue Feb 09, 2016 8:39 am
by Tron911
Hello, I've got a problem with one of my Nagios Core installations.

I've checked everything (or almost, except the right thing, I assume :D ) but I'm not finding the way...

The problem is that when one particular host goes down, Nagios didn't notify me of the down HARD state.

I'll attach some data...

This is what I get running a /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Code: Select all

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 225 services.
        Checked 460 hosts.
        Checked 40 host groups.
        Checked 4 service groups.
        Checked 11 contacts.
        Checked 1 contact groups.
        Checked 44 commands.
        Checked 15 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 460 hosts
        Checked 0 service dependencies
        Checked 324 host dependencies
        Checked 15 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
This is the host template:

Code: Select all

define host{
	name                            ls-host
	hostgroups						Tutti
	notifications_enabled           1
	event_handler_enabled           1
	flap_detection_enabled          0
	process_perf_data               1
	retain_status_information       1
	retain_nonstatus_information    1
	check_period					24x7
	check_interval					5
	retry_interval					2
	max_check_attempts				3
	check_command					check-ls-host-alive
	notification_period				24x7
	notification_interval			0
	notification_options			d,r
	contacts						beppe,lscontact,reperibility-network-day,reperibility-system-day,customercontact
	register                        0
}
This is the host definition of the problematic device (name of host modified):

Code: Select all

define host{
	use ls-host
	host_name XX-YYY-STORAG01
	address 172.16.25.19
	hostgroups 25-XX-YYY,NAS
}
This is the check_command definition:

Code: Select all

define command{
	command_name    check-ls-host-alive
	command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 1500.0,80% -c 2000.0,100% -p 5 -t 3 -4
}
One of the contact (address modified):

Code: Select all

define contact{
	contact_name					beppe
	alias							Beppe
	service_notification_period     24x7
	host_notification_period        24x7
	service_notification_options    w,u,c,r,f,s
	host_notification_options       d,u,r,f,s
	service_notification_commands   service-mail-noCC
	host_notification_commands      host-mail-noCC
	email							[email protected]
	}
And the host notification command used:

Code: Select all

define command{
	command_name	host-mail-noCC
	command_line	/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\n$HOSTALIAS$\n$HOSTNOTES$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n\nTime in state $HOSTSTATE$: $HOSTDURATION$\n" | /bin/mailx -s "** $HOSTNAME$ [$HOSTADDRESS$] $HOSTSTATE$ - Status: $NOTIFICATIONTYPE$ ** Time in state $HOSTSTATE$: $HOSTDURATION$ **" -r [email protected] $CONTACTEMAIL$
	}
This is the part of the nagios.log file where the host goes down (name of hosts modified):
[Tue Feb 9 11:25:17 2016] HOST ALERT: PLUTO;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 13.01 ms
[Tue Feb 9 11:25:17 2016] HOST NOTIFICATION: beppe;PLUTO;UP;host-mail-noCC;PING OK - Packet loss = 0%, RTA = 13.01 ms
[Tue Feb 9 11:25:17 2016] HOST NOTIFICATION: lscontact;PLUTO;UP;autoticket-ls;PING OK - Packet loss = 0%, RTA = 13.01 ms
[Tue Feb 9 11:25:17 2016] HOST NOTIFICATION: lscontact;PLUTO;UP;host-mail-noCC;PING OK - Packet loss = 0%, RTA = 13.01 ms
[Tue Feb 9 11:25:17 2016] HOST NOTIFICATION: customercontact;PLUTO;UP;customailerhost;PING OK - Packet loss = 0%, RTA = 13.01 ms
[Tue Feb 9 11:40:06 2016] HOST ALERT: XX-YYY-STORAG-01;DOWN;SOFT;1;CRITICAL - Host Unreachable (172.16.25.19)
[Tue Feb 9 11:40:28 2016] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG-01;1455014427
[Tue Feb 9 11:40:31 2016] HOST ALERT: XX-YYY-STORAG-01;DOWN;SOFT;2;CRITICAL - Host Unreachable (172.16.25.19)
[Tue Feb 9 11:40:44 2016] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG-01;1455014443
[Tue Feb 9 11:40:47 2016] HOST ALERT: XX-YYY-STORAG-01;DOWN;HARD;3;CRITICAL - Host Unreachable (172.16.25.19)
[Tue Feb 9 12:53:54 2016] HOST ALERT: SATURN;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
[Tue Feb 9 12:56:09 2016] HOST ALERT: SATURN;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
[Tue Feb 9 12:58:24 2016] HOST ALERT: SATURN;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%
[Tue Feb 9 13:03:28 2016] HOST ALERT: SATURN;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 9.60 ms
Yeah! I've just discovered that it's not a single host matter!


Any ideas or suggestion?

In the meantime I'm rebooting...


Thank you in advance.

Re: No notification on host down

Posted: Tue Feb 09, 2016 11:06 am
by rkennedy
Can you post a tail of your maillog file? I wonder if it's getting filtered somewhere.

Code: Select all

tail -n200 /var/log/maillog

Re: No notification on host down

Posted: Wed Feb 10, 2016 4:08 am
by Tron911
Yes sir, but nothing there. I suppose that the problem is before postfix, it seems that the notification is not thrown by the Nagios after the failure of the check_host command.

Code: Select all

[root@zz-sed-monit01 ~]# tail -n200 /var/log/maillog
Feb  9 06:08:38 zz-sed-monit01 postfix/pickup[19408]: 005E84011142: uid=1000 from=<[email protected]>
Feb  9 06:08:38 zz-sed-monit01 postfix/cleanup[23480]: 005E84011142: message-id=<56b97455.Lw34oFKssnHg/asf%[email protected]>
Feb  9 06:08:38 zz-sed-monit01 postfix/qmgr[2572]: 005E84011142: from=<[email protected]>, size=633, nrcpt=1 (queue active)
Feb  9 06:08:39 zz-sed-monit01 postfix/smtp[23484]: 005E84011142: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.8, delays=0.02/0.03/0.01/1.7, dsn=2.0.0, status=sent (250 OK)
Feb  9 06:08:39 zz-sed-monit01 postfix/qmgr[2572]: 005E84011142: removed
Feb  9 06:08:40 zz-sed-monit01 postfix/smtp[23482]: EE2D240AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.1, delays=0.08/0.02/0.01/2, dsn=2.0.0, status=sent (250 OK)
Feb  9 06:08:40 zz-sed-monit01 postfix/qmgr[2572]: EE2D240AEAAF: removed
Feb  9 06:08:40 zz-sed-monit01 postfix/smtp[23483]: F2C7940AEAB4: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.4, delays=0.04/0.02/0.02/2.3, dsn=2.0.0, status=sent (250 OK)
Feb  9 06:08:40 zz-sed-monit01 postfix/qmgr[2572]: F2C7940AEAB4: removed
Feb  9 06:16:24 zz-sed-monit01 postfix/pickup[19408]: 86E284011142: uid=1000 from=<[email protected]>
Feb  9 06:16:24 zz-sed-monit01 postfix/cleanup[25481]: 86E284011142: message-id=<56b97628.5/zES9ubEb6RLJbP%[email protected]>
Feb  9 06:16:24 zz-sed-monit01 postfix/qmgr[2572]: 86E284011142: from=<[email protected]>, size=594, nrcpt=1 (queue active)
Feb  9 06:16:26 zz-sed-monit01 postfix/smtp[25483]: 86E284011142: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.4, delays=0.05/0.01/0.01/2.4, dsn=2.0.0, status=sent (250 OK)
Feb  9 06:16:26 zz-sed-monit01 postfix/qmgr[2572]: 86E284011142: removed
Feb  9 10:15:02 zz-sed-monit01 postfix/pickup[4506]: 695414011142: uid=1000 from=<[email protected]>
Feb  9 10:15:02 zz-sed-monit01 postfix/cleanup[21394]: 695414011142: message-id=<56b9ae16.dd841tO0+OYuYPpm%[email protected]>
Feb  9 10:15:02 zz-sed-monit01 postfix/pickup[4506]: 6CD3F401114E: uid=1000 from=<[email protected]>
Feb  9 10:15:02 zz-sed-monit01 postfix/qmgr[2572]: 695414011142: from=<[email protected]>, size=811, nrcpt=1 (queue active)
Feb  9 10:15:02 zz-sed-monit01 postfix/cleanup[21394]: 6CD3F401114E: message-id=<56b9ae16.c2OJgPLr353I34gf%[email protected]>
Feb  9 10:15:02 zz-sed-monit01 postfix/qmgr[2572]: 6CD3F401114E: from=<[email protected]>, size=806, nrcpt=1 (queue active)
Feb  9 10:15:03 zz-sed-monit01 postfix/smtp[21396]: 695414011142: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.3, delays=0.05/0.01/0.01/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 10:15:03 zz-sed-monit01 postfix/qmgr[2572]: 695414011142: removed
Feb  9 10:15:03 zz-sed-monit01 postfix/smtp[21397]: 6CD3F401114E: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.6, delays=0.05/0.02/0.01/1.5, dsn=2.0.0, status=sent (250 OK)
Feb  9 10:15:03 zz-sed-monit01 postfix/qmgr[2572]: 6CD3F401114E: removed
Feb  9 11:25:17 zz-sed-monit01 postfix/pickup[29723]: 32F4340AEAAF: uid=1000 from=<[email protected]>
Feb  9 11:25:17 zz-sed-monit01 postfix/cleanup[6946]: 32F4340AEAAF: message-id=<56b9be8d.bTtMkuvEYKOPSFdV%[email protected]>
Feb  9 11:25:17 zz-sed-monit01 postfix/qmgr[2572]: 32F4340AEAAF: from=<[email protected]>, size=813, nrcpt=1 (queue active)
Feb  9 11:25:17 zz-sed-monit01 postfix/pickup[29723]: 3770640AEAB4: uid=1000 from=<[email protected]>
Feb  9 11:25:17 zz-sed-monit01 postfix/cleanup[6946]: 3770640AEAB4: message-id=<56b9be8d.546bNj/Ue8s1Ihbr%[email protected]>
Feb  9 11:25:17 zz-sed-monit01 postfix/qmgr[2572]: 3770640AEAB4: from=<[email protected]>, size=808, nrcpt=1 (queue active)
Feb  9 11:25:19 zz-sed-monit01 postfix/smtp[6951]: 3770640AEAB4: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.1, delays=0.05/0.01/0.02/2, dsn=2.0.0, status=sent (250 OK)
Feb  9 11:25:19 zz-sed-monit01 postfix/qmgr[2572]: 3770640AEAB4: removed
Feb  9 11:25:19 zz-sed-monit01 postfix/smtp[6950]: 32F4340AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.4, delays=0.06/0.01/0.01/2.3, dsn=2.0.0, status=sent (250 OK)
Feb  9 11:25:19 zz-sed-monit01 postfix/qmgr[2572]: 32F4340AEAAF: removed
Feb  9 12:14:48 zz-sed-monit01 postfix/pickup[29723]: 2185A40AEAAF: uid=0 from=<root>
Feb  9 12:14:48 zz-sed-monit01 postfix/cleanup[19809]: 2185A40AEAAF: message-id=<[email protected]>
Feb  9 12:14:48 zz-sed-monit01 postfix/qmgr[2572]: 2185A40AEAAF: from=<[email protected]>, size=691, nrcpt=4 (queue active)
Feb  9 12:14:48 zz-sed-monit01 postfix/local[19812]: 2185A40AEAAF: to=<[email protected]>, relay=local, delay=0.07, delays=0.03/0.03/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:14:48 zz-sed-monit01 postfix/local[19814]: 2185A40AEAAF: to=<[email protected]>, orig_to=<Raggiungile>, relay=local, delay=0.12, delays=0.03/0.04/0/0.05, dsn=5.1.1, status=bounced (unknown user: "raggiungile")
Feb  9 12:14:48 zz-sed-monit01 postfix/error[19811]: 2185A40AEAAF: to=<[email protected]>, orig_to=<-r>, relay=none, delay=0.12, delays=0.03/0.05/0/0.04, dsn=5.1.3, status=bounced (bad address syntax)
Feb  9 12:14:51 zz-sed-monit01 postfix/smtp[19813]: 2185A40AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=3.2, delays=0.03/0.03/0.01/3.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 12:14:51 zz-sed-monit01 postfix/cleanup[19809]: 42D2340AEAB4: message-id=<[email protected]>
Feb  9 12:14:51 zz-sed-monit01 postfix/bounce[19816]: 2185A40AEAAF: sender non-delivery notification: 42D2340AEAB4
Feb  9 12:14:51 zz-sed-monit01 postfix/qmgr[2572]: 42D2340AEAB4: from=<>, size=3021, nrcpt=1 (queue active)
Feb  9 12:14:51 zz-sed-monit01 postfix/qmgr[2572]: 2185A40AEAAF: removed
Feb  9 12:14:51 zz-sed-monit01 postfix/local[19812]: 42D2340AEAB4: to=<[email protected]>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:14:51 zz-sed-monit01 postfix/qmgr[2572]: 42D2340AEAB4: removed
Feb  9 12:15:09 zz-sed-monit01 postfix/pickup[29723]: 651AA40AEAAF: uid=0 from=<root>
Feb  9 12:15:09 zz-sed-monit01 postfix/cleanup[19809]: 651AA40AEAAF: message-id=<[email protected]>
Feb  9 12:15:09 zz-sed-monit01 postfix/qmgr[2572]: 651AA40AEAAF: from=<[email protected]>, size=691, nrcpt=4 (queue active)
Feb  9 12:15:09 zz-sed-monit01 postfix/error[19811]: 651AA40AEAAF: to=<[email protected]>, orig_to=<-r>, relay=none, delay=0.01, delays=0.01/0/0/0, dsn=5.1.3, status=bounced (bad address syntax)
Feb  9 12:15:09 zz-sed-monit01 postfix/local[19814]: 651AA40AEAAF: to=<[email protected]>, relay=local, delay=0.01, delays=0.01/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:15:10 zz-sed-monit01 postfix/local[19812]: 651AA40AEAAF: to=<[email protected]>, orig_to=<Raggiungile>, relay=local, delay=1, delays=0.01/0/0/0.99, dsn=5.1.1, status=bounced (unknown user: "raggiungile")
Feb  9 12:15:11 zz-sed-monit01 postfix/smtp[19813]: 651AA40AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.7, delays=0.01/0/0.01/1.7, dsn=2.0.0, status=sent (250 OK)
Feb  9 12:15:11 zz-sed-monit01 postfix/cleanup[19809]: 1641F40AEAB4: message-id=<[email protected]>
Feb  9 12:15:11 zz-sed-monit01 postfix/bounce[19815]: 651AA40AEAAF: sender non-delivery notification: 1641F40AEAB4
Feb  9 12:15:11 zz-sed-monit01 postfix/qmgr[2572]: 1641F40AEAB4: from=<>, size=3021, nrcpt=1 (queue active)
Feb  9 12:15:11 zz-sed-monit01 postfix/qmgr[2572]: 651AA40AEAAF: removed
Feb  9 12:15:11 zz-sed-monit01 postfix/local[19814]: 1641F40AEAB4: to=<[email protected]>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:15:11 zz-sed-monit01 postfix/qmgr[2572]: 1641F40AEAB4: removed
Feb  9 12:15:19 zz-sed-monit01 postfix/pickup[29723]: A5E5E40AEAAF: uid=0 from=<root>
Feb  9 12:15:19 zz-sed-monit01 postfix/cleanup[19809]: A5E5E40AEAAF: message-id=<[email protected]>
Feb  9 12:15:19 zz-sed-monit01 postfix/qmgr[2572]: A5E5E40AEAAF: from=<[email protected]>, size=691, nrcpt=4 (queue active)
Feb  9 12:15:19 zz-sed-monit01 postfix/error[19811]: A5E5E40AEAAF: to=<[email protected]>, orig_to=<-r>, relay=none, delay=0.01, delays=0.01/0/0/0, dsn=5.1.3, status=bounced (bad address syntax)
Feb  9 12:15:19 zz-sed-monit01 postfix/local[19812]: A5E5E40AEAAF: to=<[email protected]>, relay=local, delay=0.01, delays=0.01/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:15:20 zz-sed-monit01 postfix/smtp[19813]: A5E5E40AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.2, delays=0.01/0/0.01/1.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 12:15:20 zz-sed-monit01 postfix/local[19814]: A5E5E40AEAAF: to=<[email protected]>, orig_to=<Raggiungile>, relay=local, delay=1.2, delays=0.01/0/0/1.2, dsn=5.1.1, status=bounced (unknown user: "raggiungile")
Feb  9 12:15:20 zz-sed-monit01 postfix/cleanup[19809]: E1FF040AEAB4: message-id=<[email protected]>
Feb  9 12:15:20 zz-sed-monit01 postfix/qmgr[2572]: E1FF040AEAB4: from=<>, size=3021, nrcpt=1 (queue active)
Feb  9 12:15:20 zz-sed-monit01 postfix/bounce[19816]: A5E5E40AEAAF: sender non-delivery notification: E1FF040AEAB4
Feb  9 12:15:20 zz-sed-monit01 postfix/qmgr[2572]: A5E5E40AEAAF: removed
Feb  9 12:15:20 zz-sed-monit01 postfix/local[19812]: E1FF040AEAB4: to=<[email protected]>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 12:15:20 zz-sed-monit01 postfix/qmgr[2572]: E1FF040AEAB4: removed
Feb  9 12:26:27 zz-sed-monit01 postfix/pickup[21791]: 3EEC940AEAAF: uid=0 from=<[email protected]>
Feb  9 12:26:27 zz-sed-monit01 postfix/cleanup[22887]: 3EEC940AEAAF: message-id=<56b9cce3.eFE9W0m46uLFBo75%[email protected]>
Feb  9 12:26:27 zz-sed-monit01 postfix/qmgr[2572]: 3EEC940AEAAF: from=<[email protected]>, size=598, nrcpt=1 (queue active)
Feb  9 12:26:28 zz-sed-monit01 postfix/smtp[22889]: 3EEC940AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.6, delays=0.03/0.01/0.01/1.5, dsn=2.0.0, status=sent (250 OK)
Feb  9 12:26:28 zz-sed-monit01 postfix/qmgr[2572]: 3EEC940AEAAF: removed
Feb  9 12:26:57 zz-sed-monit01 postfix/pickup[21791]: 5167E40AEAAF: uid=0 from=<[email protected]>
Feb  9 12:26:57 zz-sed-monit01 postfix/cleanup[22887]: 5167E40AEAAF: message-id=<56b9cd01.UptSc1xL0l5M1PA3%[email protected]>
Feb  9 12:26:57 zz-sed-monit01 postfix/qmgr[2572]: 5167E40AEAAF: from=<[email protected]>, size=613, nrcpt=1 (queue active)
Feb  9 12:26:58 zz-sed-monit01 postfix/smtp[22889]: 5167E40AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.2, delays=0.01/0/0.01/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 12:26:58 zz-sed-monit01 postfix/qmgr[2572]: 5167E40AEAAF: removed
Feb  9 13:14:39 zz-sed-monit01 postfix/pickup[21791]: 257C140AEAAF: uid=1000 from=<[email protected]>
Feb  9 13:14:39 zz-sed-monit01 postfix/cleanup[2823]: 257C140AEAAF: message-id=<56b9d82f.bDdpra9q2mybhtS1%[email protected]>
Feb  9 13:14:39 zz-sed-monit01 postfix/qmgr[2572]: 257C140AEAAF: from=<[email protected]>, size=889, nrcpt=1 (queue active)
Feb  9 13:14:39 zz-sed-monit01 postfix/pickup[21791]: 29CE940AEAB4: uid=1000 from=<[email protected]>
Feb  9 13:14:39 zz-sed-monit01 postfix/cleanup[2823]: 29CE940AEAB4: message-id=<56b9d82f.6pRSJsg4oJ+wdfI6%[email protected]>
Feb  9 13:14:39 zz-sed-monit01 postfix/qmgr[2572]: 29CE940AEAB4: from=<[email protected]>, size=884, nrcpt=1 (queue active)
Feb  9 13:14:40 zz-sed-monit01 postfix/smtp[2827]: 29CE940AEAB4: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.2, delays=0.06/0.01/0.03/1.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 13:14:40 zz-sed-monit01 postfix/qmgr[2572]: 29CE940AEAB4: removed
Feb  9 13:14:40 zz-sed-monit01 postfix/smtp[2826]: 257C140AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.3, delays=0.07/0.02/0.02/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 13:14:40 zz-sed-monit01 postfix/qmgr[2572]: 257C140AEAAF: removed
Feb  9 13:24:38 zz-sed-monit01 postfix/pickup[21791]: 31FA540AEAAF: uid=1000 from=<[email protected]>
Feb  9 13:24:38 zz-sed-monit01 postfix/cleanup[5368]: 31FA540AEAAF: message-id=<56b9da86.djyl9Cw6yJp+Ue0M%[email protected]>
Feb  9 13:24:38 zz-sed-monit01 postfix/qmgr[2572]: 31FA540AEAAF: from=<[email protected]>, size=869, nrcpt=1 (queue active)
Feb  9 13:24:38 zz-sed-monit01 postfix/pickup[21791]: 3544040AEAB4: uid=1000 from=<[email protected]>
Feb  9 13:24:38 zz-sed-monit01 postfix/cleanup[5368]: 3544040AEAB4: message-id=<56b9da86.AoXuj7+Uj2b6sAdw%[email protected]>
Feb  9 13:24:38 zz-sed-monit01 postfix/qmgr[2572]: 3544040AEAB4: from=<[email protected]>, size=874, nrcpt=1 (queue active)
Feb  9 13:24:39 zz-sed-monit01 postfix/smtp[5371]: 3544040AEAB4: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.3, delays=0.05/0.02/0.04/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 13:24:39 zz-sed-monit01 postfix/qmgr[2572]: 3544040AEAB4: removed
Feb  9 13:24:40 zz-sed-monit01 postfix/smtp[5370]: 31FA540AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.1, delays=0.05/0.01/0.01/2, dsn=2.0.0, status=sent (250 OK)
Feb  9 13:24:40 zz-sed-monit01 postfix/qmgr[2572]: 31FA540AEAAF: removed
Feb  9 14:31:23 zz-sed-monit01 postfix/pickup[13309]: 216F340AEAAF: uid=1000 from=<[email protected]>
Feb  9 14:31:23 zz-sed-monit01 postfix/cleanup[22222]: 216F340AEAAF: message-id=<56b9ea2b.EGixDRIYXBaoMV0V%[email protected]>
Feb  9 14:31:23 zz-sed-monit01 postfix/qmgr[2572]: 216F340AEAAF: from=<[email protected]>, size=594, nrcpt=1 (queue active)
Feb  9 14:31:24 zz-sed-monit01 postfix/smtp[22225]: 216F340AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1, delays=0.03/0.01/0.01/0.98, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:31:24 zz-sed-monit01 postfix/qmgr[2572]: 216F340AEAAF: removed
Feb  9 14:42:44 zz-sed-monit01 postfix/postfix-script[2178]: starting the Postfix mail system
Feb  9 14:42:44 zz-sed-monit01 postfix/master[2201]: daemon started -- version 2.10.1, configuration /etc/postfix
Feb  9 14:43:16 zz-sed-monit01 postfix/pickup[2235]: 55A9C40AEAAF: uid=1000 from=<[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/cleanup[2904]: 55A9C40AEAAF: message-id=<56b9ecf4.P3FADXPYczO0gD6V%[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/qmgr[2236]: 55A9C40AEAAF: from=<[email protected]>, size=781, nrcpt=1 (queue active)
Feb  9 14:43:16 zz-sed-monit01 postfix/pickup[2235]: 601FA40AEAB4: uid=1000 from=<[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/cleanup[2904]: 601FA40AEAB4: message-id=<56b9ecf4.q8HnJqV9Ud46rJAe%[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/qmgr[2236]: 601FA40AEAB4: from=<[email protected]>, size=776, nrcpt=1 (queue active)
Feb  9 14:43:16 zz-sed-monit01 postfix/pickup[2235]: 61759401114D: uid=1000 from=<[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/cleanup[2904]: 61759401114D: message-id=<56b9ecf4.vnHrM2Y3q721BYZO%[email protected]>
Feb  9 14:43:16 zz-sed-monit01 postfix/qmgr[2236]: 61759401114D: from=<[email protected]>, size=622, nrcpt=1 (queue active)
Feb  9 14:43:17 zz-sed-monit01 postfix/smtp[2908]: 61759401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.3, delays=0.11/0.03/0.04/1.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:43:17 zz-sed-monit01 postfix/qmgr[2236]: 61759401114D: removed
Feb  9 14:43:17 zz-sed-monit01 postfix/smtp[2906]: 55A9C40AEAAF: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.4, delays=0.11/0.03/0.01/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:43:17 zz-sed-monit01 postfix/qmgr[2236]: 55A9C40AEAAF: removed
Feb  9 14:43:18 zz-sed-monit01 postfix/smtp[2907]: 601FA40AEAB4: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.2, delays=0.11/0.03/0.01/2, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:43:18 zz-sed-monit01 postfix/qmgr[2236]: 601FA40AEAB4: removed
Feb  9 14:46:40 zz-sed-monit01 postfix/pickup[2235]: 7C1BA401114D: uid=1000 from=<[email protected]>
Feb  9 14:46:40 zz-sed-monit01 postfix/cleanup[3824]: 7C1BA401114D: message-id=<56b9edc0.jU/QA89vTlx5i5qG%[email protected]>
Feb  9 14:46:40 zz-sed-monit01 postfix/pickup[2235]: 7FA0D401114E: uid=1000 from=<[email protected]>
Feb  9 14:46:40 zz-sed-monit01 postfix/qmgr[2236]: 7C1BA401114D: from=<[email protected]>, size=795, nrcpt=1 (queue active)
Feb  9 14:46:40 zz-sed-monit01 postfix/cleanup[3824]: 7FA0D401114E: message-id=<56b9edc0.dxh9qKhgz9zcrMJt%[email protected]>
Feb  9 14:46:40 zz-sed-monit01 postfix/qmgr[2236]: 7FA0D401114E: from=<[email protected]>, size=790, nrcpt=1 (queue active)
Feb  9 14:46:42 zz-sed-monit01 postfix/smtp[3827]: 7FA0D401114E: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.2, delays=0.03/0.01/0.03/2.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:46:42 zz-sed-monit01 postfix/qmgr[2236]: 7FA0D401114E: removed
Feb  9 14:46:45 zz-sed-monit01 postfix/smtp[3826]: 7C1BA401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=5, delays=0.05/0.01/0.01/5, dsn=2.0.0, status=sent (250 OK)
Feb  9 14:46:45 zz-sed-monit01 postfix/qmgr[2236]: 7C1BA401114D: removed
Feb  9 15:04:19 zz-sed-monit01 postfix/pickup[2235]: B768E401114D: uid=0 from=<root>
Feb  9 15:04:19 zz-sed-monit01 postfix/cleanup[8361]: B768E401114D: message-id=<[email protected]>
Feb  9 15:04:19 zz-sed-monit01 postfix/qmgr[2236]: B768E401114D: from=<[email protected]>, size=1112, nrcpt=1 (queue active)
Feb  9 15:04:19 zz-sed-monit01 postfix/local[8364]: B768E401114D: to=<[email protected]>, orig_to=<root>, relay=local, delay=0.1, delays=0.03/0.05/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Feb  9 15:04:19 zz-sed-monit01 postfix/qmgr[2236]: B768E401114D: removed
Feb  9 15:24:40 zz-sed-monit01 postfix/pickup[2235]: 063B6401114D: uid=1000 from=<[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/cleanup[13667]: 063B6401114D: message-id=<56b9f6a7.NPagiXFrrhGwdy89%[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/qmgr[2236]: 063B6401114D: from=<[email protected]>, size=784, nrcpt=1 (queue active)
Feb  9 15:24:40 zz-sed-monit01 postfix/pickup[2235]: 0A5EB401114E: uid=1000 from=<[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/cleanup[13667]: 0A5EB401114E: message-id=<56b9f6a7.en28ZE2ogpT93NHh%[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/qmgr[2236]: 0A5EB401114E: from=<[email protected]>, size=779, nrcpt=1 (queue active)
Feb  9 15:24:40 zz-sed-monit01 postfix/pickup[2235]: 0D4FB4011151: uid=1000 from=<[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/cleanup[13667]: 0D4FB4011151: message-id=<56b9f6a7.5plMiMcqfal8QivX%[email protected]>
Feb  9 15:24:40 zz-sed-monit01 postfix/qmgr[2236]: 0D4FB4011151: from=<[email protected]>, size=629, nrcpt=1 (queue active)
Feb  9 15:24:40 zz-sed-monit01 postfix/smtp[13672]: 0D4FB4011151: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=0.92, delays=0.03/0.01/0.04/0.84, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:24:40 zz-sed-monit01 postfix/qmgr[2236]: 0D4FB4011151: removed
Feb  9 15:24:41 zz-sed-monit01 postfix/smtp[13671]: 0A5EB401114E: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.2, delays=0.07/0.01/0.01/1.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:24:41 zz-sed-monit01 postfix/qmgr[2236]: 0A5EB401114E: removed
Feb  9 15:24:41 zz-sed-monit01 postfix/smtp[13670]: 063B6401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.4, delays=0.06/0.02/0.01/1.4, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:24:41 zz-sed-monit01 postfix/qmgr[2236]: 063B6401114D: removed
Feb  9 15:28:24 zz-sed-monit01 postfix/pickup[2235]: 4E9B74035067: uid=1000 from=<[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/cleanup[14519]: 4E9B74035067: message-id=<56b9f788.dEIIfvG05npnHZ3L%[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/qmgr[2236]: 4E9B74035067: from=<[email protected]>, size=782, nrcpt=1 (queue active)
Feb  9 15:28:24 zz-sed-monit01 postfix/pickup[2235]: 530B0403506A: uid=1000 from=<[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/cleanup[14519]: 530B0403506A: message-id=<56b9f788.O1j3nGU/fq64B3bg%[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/qmgr[2236]: 530B0403506A: from=<[email protected]>, size=787, nrcpt=1 (queue active)
Feb  9 15:28:24 zz-sed-monit01 postfix/pickup[2235]: 55E51401114D: uid=1000 from=<[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/cleanup[14519]: 55E51401114D: message-id=<56b9f788.rel3ooawkPZEQH4J%[email protected]>
Feb  9 15:28:24 zz-sed-monit01 postfix/qmgr[2236]: 55E51401114D: from=<[email protected]>, size=635, nrcpt=1 (queue active)
Feb  9 15:28:25 zz-sed-monit01 postfix/smtp[14527]: 55E51401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.3, delays=0.02/0.01/0.01/1.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:28:25 zz-sed-monit01 postfix/qmgr[2236]: 55E51401114D: removed
Feb  9 15:28:26 zz-sed-monit01 postfix/smtp[14525]: 4E9B74035067: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.2, delays=0.04/0.01/0.01/2.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:28:26 zz-sed-monit01 postfix/qmgr[2236]: 4E9B74035067: removed
Feb  9 15:28:26 zz-sed-monit01 postfix/smtp[14526]: 530B0403506A: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=2.3, delays=0.05/0.01/0.02/2.2, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:28:26 zz-sed-monit01 postfix/qmgr[2236]: 530B0403506A: removed
Feb  9 15:31:28 zz-sed-monit01 postfix/pickup[2235]: 021B04022F13: uid=1000 from=<[email protected]>
Feb  9 15:31:28 zz-sed-monit01 postfix/cleanup[15494]: 021B04022F13: message-id=<56b9f83f.YcegDfwudffPAJfM%[email protected]>
Feb  9 15:31:28 zz-sed-monit01 postfix/qmgr[2236]: 021B04022F13: from=<[email protected]>, size=611, nrcpt=1 (queue active)
Feb  9 15:31:29 zz-sed-monit01 postfix/smtp[15496]: 021B04022F13: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.1, delays=0.02/0.01/0.01/1.1, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:31:29 zz-sed-monit01 postfix/qmgr[2236]: 021B04022F13: removed
Feb  9 15:38:46 zz-sed-monit01 postfix/pickup[2235]: 12BEF4022F13: uid=1000 from=<[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/cleanup[17226]: 12BEF4022F13: message-id=<56b9f9f5.gGRR0oCF9CMEiraD%[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/qmgr[2236]: 12BEF4022F13: from=<[email protected]>, size=789, nrcpt=1 (queue active)
Feb  9 15:38:46 zz-sed-monit01 postfix/pickup[2235]: 1911A401114D: uid=1000 from=<[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/cleanup[17226]: 1911A401114D: message-id=<56b9f9f5.//k1xaRGt2xuXcZq%[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/qmgr[2236]: 1911A401114D: from=<[email protected]>, size=784, nrcpt=1 (queue active)
Feb  9 15:38:46 zz-sed-monit01 postfix/pickup[2235]: 1A58F401114E: uid=1000 from=<[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/cleanup[17226]: 1A58F401114E: message-id=<56b9f9f6.6AeY+l5IUOYKHHGy%[email protected]>
Feb  9 15:38:46 zz-sed-monit01 postfix/qmgr[2236]: 1A58F401114E: from=<[email protected]>, size=631, nrcpt=1 (queue active)
Feb  9 15:38:47 zz-sed-monit01 postfix/smtp[17238]: 1911A401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.4, delays=0.07/0.03/0.01/1.3, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:38:47 zz-sed-monit01 postfix/qmgr[2236]: 1911A401114D: removed
Feb  9 15:38:47 zz-sed-monit01 postfix/smtp[17237]: 12BEF4022F13: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.6, delays=0.07/0.02/0.02/1.5, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:38:47 zz-sed-monit01 postfix/qmgr[2236]: 12BEF4022F13: removed
Feb  9 15:38:48 zz-sed-monit01 postfix/smtp[17240]: 1A58F401114E: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.9, delays=0.01/0.03/0.04/1.8, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:38:48 zz-sed-monit01 postfix/qmgr[2236]: 1A58F401114E: removed
Feb  9 15:45:00 zz-sed-monit01 postfix/pickup[2235]: 4F98E401114D: uid=1000 from=<[email protected]>
Feb  9 15:45:00 zz-sed-monit01 postfix/cleanup[18930]: 4F98E401114D: message-id=<56b9fb6c.tDDG6Mf8L4IplZTl%[email protected]>
Feb  9 15:45:00 zz-sed-monit01 postfix/qmgr[2236]: 4F98E401114D: from=<[email protected]>, size=784, nrcpt=1 (queue active)
Feb  9 15:45:00 zz-sed-monit01 postfix/pickup[2235]: 5294C4016EF3: uid=1000 from=<[email protected]>
Feb  9 15:45:00 zz-sed-monit01 postfix/cleanup[18930]: 5294C4016EF3: message-id=<56b9fb6c.sbXRCHMz+pABlxaJ%[email protected]>
Feb  9 15:45:00 zz-sed-monit01 postfix/qmgr[2236]: 5294C4016EF3: from=<[email protected]>, size=789, nrcpt=1 (queue active)
Feb  9 15:45:01 zz-sed-monit01 postfix/smtp[18933]: 5294C4016EF3: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=1.4, delays=0.05/0.02/0.01/1.3, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:45:01 zz-sed-monit01 postfix/qmgr[2236]: 5294C4016EF3: removed
Feb  9 15:45:04 zz-sed-monit01 postfix/smtp[18932]: 4F98E401114D: to=<[email protected]>, relay=172.16.2.12[172.16.2.12]:25, delay=4.6, delays=0.07/0.01/3/1.5, dsn=2.0.0, status=sent (250 OK)
Feb  9 15:45:04 zz-sed-monit01 postfix/qmgr[2236]: 4F98E401114D: removed
I've enabled notification debug (32, very detailed) and emulated a down. Notification executed (down made by adding a fake route).

I've seen a lot of messages about notification suppression in the nagios.debug file, but they appear to be correct:

Code: Select all

[omitted output]
[Wed Feb 10 09:34:22 2016.682666] [032.0] [pid=18098] ** Service Notification Attempt ** Host: 'VENUS', Service: 'CHECK ESX DATASTORE', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Thu Jan 28 16:55:16 2016
[Wed Feb 10 09:34:22 2016.682708] [032.1] [pid=18098] We shouldn't re-notify contacts about this service problem.
[Wed Feb 10 09:34:22 2016.682713] [032.0] [pid=18098] Notification viability test failed.  No notification will be sent out.
[omitted output]
Any ideas?

Thank you.

Re: No notification on host down

Posted: Wed Feb 10, 2016 5:56 pm
by tgriep
Is that host in a Host Dependancies group?
If so, can you post the settings for it?

Also, can you open this file, find the host in it and post that here?

Code: Select all

/usr/local/nagios/var/objects.cache

Re: No notification on host down

Posted: Thu Feb 11, 2016 5:07 am
by Tron911
This is the dependency tree:

Code: Select all

define hostdependency {
	host_name	XX-YYY-FIREWA01
	dependent_host_name	XX-YYY-STORAG01
	inherits_parent	1
	notification_failure_options	d,u
	}

define hostdependency {
	host_name	XX-YYY-MCLINK
	dependent_host_name	XX-YYY-FIREWA01
	inherits_parent	1
	notification_failure_options	d,u
	}

define hostdependency {
	host_name	XX-YYY-TELECOM
	dependent_host_name	XX-YYY-FIREWA01
	inherits_parent	1
	notification_failure_options	d,u
	}

This is the host in the /usr/local/nagios/var/objects.cache file:

Code: Select all

define host {
	host_name	XX-YYY-STORAG01
	alias	XX-YYY-STORAG01
	address	172.16.25.19
	check_period	24x7
	check_command	check-ls-host-alive
	contacts	customercontact,reperibility-system-day,reperibility-network-day,lscontact,beppe
	notification_period	24x7
	initial_state	o
	importance	0
	check_interval	5.000000
	retry_interval	2.000000
	max_check_attempts	3
	active_checks_enabled	1
	passive_checks_enabled	1
	obsess	1
	event_handler_enabled	1
	low_flap_threshold	0.000000
	high_flap_threshold	0.000000
	flap_detection_enabled	0
	flap_detection_options	a
	freshness_threshold	0
	check_freshness	0
	notification_options	r,d
	notifications_enabled	1
	notification_interval	0.000000
	first_notification_delay	0.000000
	stalking_options	n
	process_perf_data	1
	retain_status_information	1
	retain_nonstatus_information	1
	}
Another notification lost... This is the output of a cat /nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e' | grep "Feb 11 06:":

Code: Select all

[Thu Feb 11 06:29:40 2016] HOST ALERT: QQ-RRR-MCLINK-SECONDARY;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
[Thu Feb 11 06:31:55 2016] HOST ALERT: QQ-RRR-MCLINK-SECONDARY;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
[Thu Feb 11 06:34:10 2016] HOST ALERT: QQ-RRR-MCLINK-SECONDARY;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%
[Thu Feb 11 06:44:29 2016] HOST ALERT: QQ-RRR-MCLINK-SECONDARY;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 9.38 ms
[Thu Feb 11 06:47:41 2016] Auto-save of retention data completed successfully.
This is the host definition in object.cache, it is not dependent but is a master host of another one:

Code: Select all

define host {
	host_name	QQ-RRR-MCLINK-SECONDARY
	alias	QQ-RRR-MCLINK-SECONDARY
	address	172.16.29.100
	check_period	24x7
	check_command	check-ls-host-alive
	contacts	customercontact,reperibility-system-day,reperibility-network-day,lscontact,beppe
	notification_period	24x7
	initial_state	o
	importance	0
	check_interval	5.000000
	retry_interval	2.000000
	max_check_attempts	3
	active_checks_enabled	1
	passive_checks_enabled	1
	obsess	1
	event_handler_enabled	1
	low_flap_threshold	0.000000
	high_flap_threshold	0.000000
	flap_detection_enabled	0
	flap_detection_options	a
	freshness_threshold	0
	check_freshness	0
	notification_options	r,d
	notifications_enabled	1
	notification_interval	0.000000
	first_notification_delay	0.000000
	stalking_options	n
	process_perf_data	1
	retain_status_information	1
	retain_nonstatus_information	1
	}

define hostdependency {
	host_name	QQ-RRR-MCLINK-SECONDARY
	dependent_host_name	QQ-RRR-FIREWALL
	inherits_parent	1
	notification_failure_options	d,u
	}
Thank you.

Re: No notification on host down

Posted: Thu Feb 11, 2016 6:09 pm
by ssax
You have notification_interval set to 0 in the template, if you set this value to 0, Nagios will not re-notify contacts about problems for this host - only one problem notification will be sent out. Are you sure no emails have been sent out? Can you try setting it to something else for testing to see if any of them get through?

Thank you

Re: No notification on host down

Posted: Thu Feb 11, 2016 6:11 pm
by tgriep
Can you remove that host from the Host Dependency and see if the notifications start to work?

Re: No notification on host down

Posted: Tue Feb 16, 2016 4:00 am
by Tron911
You found it (tgriep)!

@ ssax: the notification_interval was set to 0 voluntarily. I'm also sure that no notifications were sent before.

These are the logs involved in the test requested.

With dependencies:

Code: Select all

DOWN
Feb 16 08:49:09 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455608947
Feb 16 08:49:24 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
Feb 16 08:49:31 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455608970
Feb 16 08:49:47 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Feb 16 08:49:59 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455608998
Feb 16 08:50:14 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%

UP
Feb 16 08:51:51 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455609109
Feb 16 08:51:55 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 20.17 ms
Without dependencies:

Code: Select all

DOWN
Feb 16 09:39:30 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455611969
Feb 16 09:39:45 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
Feb 16 09:39:57 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455611996
Feb 16 09:40:12 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Feb 16 09:41:18 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455612077
Feb 16 09:41:34 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;DOWN;HARD;3;CRITICAL - Plugin timed out after 15 seconds
Feb 16 09:41:34 zz-sed-monit01 nagios: HOST NOTIFICATION: beppe;XX-YYY-STORAG01;DOWN;host-mail-noCC;CRITICAL - Plugin timed out after 15 seconds
Feb 16 09:41:34 zz-sed-monit01 nagios: HOST NOTIFICATION: lscontact;XX-YYY-STORAG01;DOWN;autoticket-ls;CRITICAL - Plugin timed out after 15 seconds
Feb 16 09:41:34 zz-sed-monit01 nagios: HOST NOTIFICATION: lscontact;XX-YYY-STORAG01;DOWN;host-mail-noCC;CRITICAL - Plugin timed out after 15 seconds
Feb 16 09:41:34 zz-sed-monit01 nagios: HOST NOTIFICATION: customercontact;XX-YYY-STORAG01;DOWN;customailerhost;CRITICAL - Plugin timed out after 15 seconds

UP
Feb 16 09:43:18 zz-sed-monit01 nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_CHECK;XX-YYY-STORAG01;1455612197
Feb 16 09:43:22 zz-sed-monit01 nagios: HOST ALERT: XX-YYY-STORAG01;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 15.24 ms
Feb 16 09:43:22 zz-sed-monit01 nagios: HOST NOTIFICATION: beppe;XX-YYY-STORAG01;UP;host-mail-noCC;PING OK - Packet loss = 0%, RTA = 15.24 ms
Feb 16 09:43:22 zz-sed-monit01 nagios: HOST NOTIFICATION: lscontact;XX-YYY-STORAG01;UP;autoticket-ls;PING OK - Packet loss = 0%, RTA = 15.24 ms
Feb 16 09:43:22 zz-sed-monit01 nagios: HOST NOTIFICATION: lscontact;XX-YYY-STORAG01;UP;host-mail-noCC;PING OK - Packet loss = 0%, RTA = 15.24 ms
Feb 16 09:43:22 zz-sed-monit01 nagios: HOST NOTIFICATION: customercontact;XX-YYY-STORAG01;UP;customailerhost;PING OK - Packet loss = 0%, RTA = 15.24 ms

So what is the problem? Host dependencies are almost mandatory in the customer's network...

Thank you,
Giuseppe

Re: No notification on host down

Posted: Tue Feb 16, 2016 3:49 pm
by rkennedy
This is strange, can you please provide a couple files for us to review? They are -

Code: Select all

/usr/local/nagios/var/status.dat
/usr/local/nagios/var/objects.cache
With those we can take a deeper look at what's going on.

Re: No notification on host down

Posted: Thu Feb 18, 2016 8:00 am
by Tron911
Hello, I've attached the files requested: I took the files configuring the host with and without dependency.
Let me know if I can do something more.

Thank you,
Giuseppe