Page 4 of 9

Re: Random emails

Posted: Thu Aug 07, 2014 8:23 pm
by Box293
Getting there. The only notification that was triggered was a host that had notifications disabled:

Code: Select all

[1407425418.741244] [016.1] [pid=17742] HOST: G1VPSQL15, ATTEMPT=1/3, CHECK TYPE=ACTIVE, STATE TYPE=HARD, OLD STATE=1, NEW STATE=1
[1407425418.741247] [016.1] [pid=17742] Host was DOWN.
[1407425418.741250] [016.1] [pid=17742] Host is still DOWN.
[1407425418.741252] [001.0] [pid=17742] determine_host_reachability(host=G1VPSQL15)
[1407425418.741255] [016.1] [pid=17742] Pre-handle_host_state() Host: G1VPSQL15, Attempt=1/3, Type=HARD, Final State=1 (DOWN)
[1407425418.741258] [001.0] [pid=17742] handle_host_state()
[1407425418.741260] [001.0] [pid=17742] obsessive_compulsive_host_check_processor()
[1407425418.741263] [001.0] [pid=17742] run_host_performance_data_command()
[1407425418.741266] [001.0] [pid=17742] update_host_performance_data_file()
[1407425418.741268] [001.0] [pid=17742] clear_volatile_macros_r()
[1407425418.741276] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407425418.741283] [001.0] [pid=17742] check_host_notification_viability()
[1407425418.741286] [001.0] [pid=17742] check_time_against_period()
[1407425418.741290] [001.0] [pid=17742] _get_matching_timerange()
[1407425418.741294] [032.1] [pid=17742] Notifications are temporarily disabled for this host, so we won't send one out.
[1407425418.741297] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
We need some more debug with logs of hosts going down or services going warning or critical.

Re: Random emails

Posted: Fri Aug 08, 2014 10:23 am
by JohnFLi
That part is a problem. My host and/or services are not going down that much......yet I get bogus emails constantly.

As you can see in the attached image, lots of messages nagios is sending, and it is sending them to a bogus email.
I have included the debug file that covers the same time frame as the screen shot.
for example, you can see in the screen shot, it attempted to email saying host GBI1 is down, then shortly afterward, it tried sending an email saying it was up. Yet, when looking at the debug file, it never says anythign about it being down.

Re: Random emails

Posted: Fri Aug 08, 2014 6:26 pm
by Box293
The images is helpful but the debug log is for the time period 08:15:46 to 08:15:48, so I can't compare the two.

I'm guessing your timezone is GMT-7 (PDT or MST).

Can you post the debug log for the time period:
07:50:00 = [1407509400.xxxxxx]
to
08:10:00 = [1407510600.xxxxxx]

Then we can compare the screenshot to the debug log.

Re: Random emails

Posted: Mon Aug 11, 2014 11:11 am
by JohnFLi
Those logs are long gone. but here are the newest one:
http://www.ivhs.us/nagios.debug
http://www.ivhs.us/nagios.debug.old

Re: Random emails

Posted: Mon Aug 11, 2014 4:04 pm
by scottwilkerson
Being all these messages are "Undeliverable" could this be because your system is trying to re-deliver a old message? Would it be possible to see the date in the body of the notification?

Re: Random emails

Posted: Mon Aug 11, 2014 4:21 pm
by JohnFLi
click on the image and you can see the full email.

Nagios doesn't have any idea if a message is undeliverable or not. it just sends emails about, it doesn't recieve any.
even if it was trying to resend......the one issue is that it is trying to an email address that is not set anywhere.
Im thinking that it (for some stupid reason) is that it sends these bogus emails everytime a service or host state changes, but becasue it hasn't researched the time it should really send an email, it puts a blank address in the "To:" box.

Alos, dont forget, on teh web interface, and I go to the Notification link, it does not show it sent anything, nor does the event log.

Re: Random emails

Posted: Mon Aug 11, 2014 8:39 pm
by Box293
How are the emails being sent from Nagios to your Exchange server?

Sendmail?

Can we turn on logging on that and see what is happening.

Also, we can try setting:

Code: Select all

debug_level=32
# DEBUG LEVEL
# This option determines how much (if any) debugging information will
# be written to the debug file. OR values together to log multiple
# types of information.
# Values:
# -1 = Everything
# 0 = Nothing
# 1 = Functions
# 2 = Configuration
# 4 = Process information
# 8 = Scheduled events
# 16 = Host/service checks
# 32 = Notifications
# 64 = Event broker
# 128 = External commands
# 256 = Commands
# 512 = Scheduled downtime
# 1024 = Comments
# 2048 = Macros

Re: Random emails

Posted: Tue Aug 12, 2014 11:19 am
by JohnFLi
debug is now set to 32.
Postfix is what is on the Nagios server. (it came with the os iso. Centos 6.5)

Re: Random emails

Posted: Tue Aug 12, 2014 11:57 am
by JohnFLi
Good grief..... I am at a larger loss than before....

ok, here is the log since I sent debug to 32. (G1VPSQL15, and G1VPSPS01 are both off and set as in downtime. G1VPOSS02 I stopped the NSClient to force valid notifications)

Code: Select all

[1407860309.385096] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860309.385101] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860345.341030] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'CPU_Load', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860345.341046] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860345.341057] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860364.335019] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'C:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860364.335053] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860364.335058] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860365.337301] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'UpTime', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860365.337331] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860365.337336] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860372.333235] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'CPU_Load', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860372.333270] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860372.333275] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860375.349216] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860375.349229] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860375.349233] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860394.349069] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860394.349084] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860394.349089] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860418.334129] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'E:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860418.334173] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860418.334179] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860448.341104] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860448.341122] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860448.341127] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860464.343580] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'UpTime', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860464.343597] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860464.343602] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860464.343778] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'E:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860464.343787] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860464.343792] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860494.350543] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860494.350559] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860494.350563] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860497.333491] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'F:\ Drive Space 85/90', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860497.333528] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860497.333534] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860527.341081] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860527.341098] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860527.341103] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860542.339010] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'C:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860542.339025] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860542.339030] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860572.354122] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860572.354142] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860572.354146] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860576.346120] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'Memory Usage', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860576.346136] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860576.346140] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860578.337173] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'Memory Usage', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860578.337209] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860578.337214] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860606.354165] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860606.354184] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860606.354189] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860608.346088] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860608.346103] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860608.346107] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860645.317746] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'CPU_Load', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860645.317793] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860645.317799] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860664.311372] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'C:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860664.311410] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860664.311416] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860665.314152] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'UpTime', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860665.314190] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860665.314196] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860672.309558] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'CPU_Load', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860672.309598] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860672.309605] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860675.326020] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860675.326032] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860675.326037] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860694.320017] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860694.320033] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860694.320037] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860718.310984] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'E:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860718.311025] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860718.311031] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860748.319129] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860748.319156] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860748.319161] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860764.320053] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'UpTime', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860764.320092] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860764.320098] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860764.321082] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'E:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860764.321128] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860764.321134] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860794.331225] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860794.331249] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860794.331254] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860797.310438] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'F:\ Drive Space 85/90', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860797.310474] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860797.310480] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860827.318083] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860827.318103] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860827.318108] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860842.315172] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'C:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860842.315210] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860842.315216] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860872.322049] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860872.322067] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860872.322072] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860876.323408] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'Memory Usage', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860876.323444] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860876.323450] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860878.313928] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSPS01', Service: 'Memory Usage', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860878.313965] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860878.313970] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860906.330026] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSQL15', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860906.330043] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860906.330048] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860908.321066] [032.0] [pid=17742] ** Host Notification Attempt ** Host: 'G1VPSPS01', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1407860908.321084] [032.1] [pid=17742] This host problem has already been acknowledged, so we won't send a notification out!
[1407860908.321089] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407860945.294676] [032.0] [pid=17742] ** Service Notification Attempt ** Host: 'G1VPSQL15', Service: 'CPU_Load', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407860945.294729] [032.1] [pid=17742] The host is either down or unreachable, so we won't notify contacts about this service.
[1407860945.294734] [032.0] [pid=17742] Notification viability test failed.  No notification will be sent out.
[1407861896.028673] [032.0] [pid=6463] ** Service Notification Attempt ** Host: 'G1VPOSS02', Service: 'C:\ Drive Space', Type: NORMAL, Options: 0, Current State: 2, Last Notification: Wed Dec 31 16:00:00 1969
[1407861896.028703] [032.1] [pid=6463] The host is either down or unreachable, so we won't notify contacts about this service.
[1407861896.028711] [032.0] [pid=6463] Notification viability test failed.  No notification will be sent out.
[1407861896.030447] [032.0] [pid=6463] ** Host Notification Attempt ** Host: 'G1VPOSS02', Type: NORMAL, Options: 0, Current State: 1, Last Notification: Tue Aug 12 09:43:56 2014
[1407861896.030463] [032.1] [pid=6463] Its not yet time to re-notify the contacts about this host problem...
[1407861896.030471] [032.1] [pid=6463] Next acceptable notification time: Tue Aug 12 09:48:56 2014
[1407861896.030476] [032.0] [pid=6463] Notification viability test failed.  No notification will be sent out.
If you view the attachment, bogus emails had been sent.

Re: Random emails

Posted: Tue Aug 12, 2014 4:15 pm
by JohnFLi
Just to make this even more fun...... I tested something.
I disabled one of the machines it emails about (to the bad address that doesn't exist)
I did not get anymore emails for a little while. Then about 1 min. after re-enabling that host, Nagios sent 2 emails saying the host is up. I look in the nagios.debug log (after setting the lvl to 32 5 hrs ago, there is no mention of the server I just re-enabled)