Server issues when multiple hosts were down
Posted: Mon May 18, 2015 10:54 am
Ok, let me describe what was going on here and what I was seeing out of the Nagios server over this weekend.
So, I have 2 questions out of this mess...
1. The check that runs and validates server performance and stuff, can you think of any reason it wouldn't be working proerly during this mayhem or anything it may rely on in the script when the items are offloaded like I have them?
2. Is there any setting I can make that automatically makes services dependent upon their hosts? I'd love to set that up so checks are not performed while the host is down. I know that isn't default behavior, but I don't want to have to create 1000+ dependency configs.
- Currently I have ~1000 hosts(avg 9 down) and ~16000 services(avg 800 issues) being monitored by my XI 2014r2.6 server.
- Average load is normally 1.5-3.0 and ~500 total processes.
- This weekend we had major work being performed in one of our datacenters that caused us to down ~300 hosts and ~4800 additional services.
- I scheduled downtimes for all the services and hosts(thank god for scripting!).
So, I have 2 questions out of this mess...
1. The check that runs and validates server performance and stuff, can you think of any reason it wouldn't be working proerly during this mayhem or anything it may rely on in the script when the items are offloaded like I have them?
2. Is there any setting I can make that automatically makes services dependent upon their hosts? I'd love to set that up so checks are not performed while the host is down. I know that isn't default behavior, but I don't want to have to create 1000+ dependency configs.