Hello,
im building a new nagios environment and have notification alerts going through pager duty.
all alerts are working except for some reason when i bring down a server I can't get a notification triggered. Other services are alerting fine.
Here's part of the nagios.log showing the host alert detected but no notification sent out. (The notification you are seeing is a service notification. I need the notification that the host is down and that isn't working).
Also is there a way to trigger this alert without actually bring down the server to test?
Any help will be greatly appreciated:
[1403305466] HOST ALERT: mr11p01ad-websrvr001.iad.apple.com;DOWN;SOFT;3;CRITICAL - Host Unreachable (17.178.71.24)
[1403305536] HOST ALERT: mr11p01ad-websrvr001.iad.apple.com;DOWN;SOFT;4;CRITICAL - Host Unreachable (17.178.71.24)
[1403305606] HOST ALERT: mr11p01ad-websrvr001.iad.apple.com;DOWN;HARD;5;CRITICAL - Host Unreachable (17.178.71.24)
[1403305676] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;HARD;1;Connection refused or timed out
[1403305776] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;HARD;1;Connection refused or timed out
[1403305796] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;NTP;CRITICAL;HARD;1;NTP CRITICAL: No response from NTP server
[1403306026] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;SSH;CRITICAL;HARD;1;No route to host
[1403306026] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;PING;CRITICAL;HARD;3;CRITICAL - Host Unreachable (17.178.71.24)
[1403306036] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;SWAP;CRITICAL;HARD;3;Connection refused or timed out
[1403306276] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;HARD;1;Connection refused or timed out
[1403306376] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;HARD;1;Connection refused by host
[1403306386] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;NTP;OK;HARD;1;NTP OK: Offset unknown
[1403306396] HOST ALERT: mr11p01ad-websrvr001.iad.apple.com;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.22 ms
[1403306626] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;SSH;OK;HARD;1;SSH OK - OpenSSH_5.3 (protocol 2.0)
[1403306626] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.22 ms
[1403306636] SERVICE NOTIFICATION: pagerduty;mr11p01ad-websrvr001.iad.apple.com;SWAP;CRITICAL;notify-service-by-pagerduty;Connection refused by host
[1403306636] SERVICE NOTIFICATION: nagiosadmin;mr11p01ad-websrvr001.iad.apple.com;SWAP;CRITICAL;notify-service-by-email;Connection refused by host
[1403306796] Auto-save of retention data completed successfully.
[1403306876] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;SOFT;1;Connection refused by host
[1403306976] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;SOFT;1;Connection refused by host
[1403306996] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;SOFT;2;Connection refused by host
[1403307096] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;SOFT;2;Connection refused by host
[1403307116] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;HARD;3;Connection refused by host
[1403307116] SERVICE NOTIFICATION: pagerduty;mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;notify-service-by-pagerduty;Connection refused by host
[1403307116] SERVICE NOTIFICATION: nagiosadmin;mr11p01ad-websrvr001.iad.apple.com;CPU LOAD;CRITICAL;notify-service-by-email;Connection refused by host
[1403307216] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;HARD;3;Connection refused by host
[1403307216] SERVICE NOTIFICATION: pagerduty;mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;notify-service-by-pagerduty;Connection refused by host
[1403307216] SERVICE NOTIFICATION: nagiosadmin;mr11p01ad-websrvr001.iad.apple.com;root;CRITICAL;notify-service-by-email;Connection refused by host
[1403308056] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;crontab;CRITICAL;SOFT;1;Connection refused by host
[1403308176] SERVICE ALERT: mr11p01ad-websrvr001.iad.apple.com;crontab;CRITICAL;SOFT;2;Connection refused by host
check-host-alive - notification problem
Re: check-host-alive - notification problem
here's additional info:
I have noticed the following when restarting nagios. Im wondering if this is causing the problem BUT I'm getting all other alerts EXCEPT server down coming from check-host-alive
[1403441564] Warning: Host 'mr11p01ad-etlsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncsdb001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncsdb002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncssas001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncssas002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-segappsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-websrvr001.iad.apple.com' has no default contacts or contactgroups defined!
I have noticed the following when restarting nagios. Im wondering if this is causing the problem BUT I'm getting all other alerts EXCEPT server down coming from check-host-alive
[1403441564] Warning: Host 'mr11p01ad-etlsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncsdb001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncsdb002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncssas001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-ncssas002.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-segappsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-websrvr001.iad.apple.com' has no default contacts or contactgroups defined!
Re: check-host-alive - notification problem
Are you receiving the pagerduty notices?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: check-host-alive - notification problem
You can test if the alerts are working by using the Host Command "Send custom host notification".simonl wrote:Also is there a way to trigger this alert without actually bring down the server to test?
If the test above does not work for these hosts then you may not have any contacts defined for the hosts. Check your hosts config file for the contacts or contact_groups directives. If they are not present, add them. If you are using a template, check the template config file for the same.simonl wrote:[1403441564] Warning: Host 'mr11p01ad-etlsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr001.iad.apple.com' has no default contacts or contactgroups defined!
[1403441564] Warning: Host 'mr11p01ad-matchappsrvr002.iad.apple.com' has no default contacts or contactgroups defined!
Don't forget to restart nagios if you make any changes.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: check-host-alive - notification problem
I apologize for not getting back to the forum sooner.
I realized that there were no contact groups defined where I was calling the check-host-alive in the linux.cfg file.
Thanks all for looking and thinking about this.
I realized that there were no contact groups defined where I was calling the check-host-alive in the linux.cfg file.
Thanks all for looking and thinking about this.
Re: check-host-alive - notification problem
I am glad your issue has been resolved! I am locking this post now. If you have any more questions/issues, please, start a new thread.
Be sure to check out our Knowledgebase for helpful articles and solutions!