Multiple duplicate services for the same host
Posted: Fri Aug 01, 2014 9:54 am
We have services setup for each host that send out notifications to IT staff when they go to Warning or Critical status. Our IT executives would like to change these to notify one time and also send a notification to our help desk system to open an automated ticket. They would like our system to not become inundated with notifications every hour for problems. For example, disk usage becomes critical on a server, they would like a notification to go out to IT staff and also a notification to go to the help desk that will generate a ticket to keep up with the problem and then notifications to stop from that service after that.
The problem I'm having is that sometimes servers have a small problem that last only a few minutes and then resolve themselves. Say a network problem happens over a five minute period in the middle of the night and resolves, but now Nagios has sent out 60 help desk tickets for the problem. That's a lot of overhead.
What I thought of doing was setting up separate service checks just for the Help desk that would check 5 times over 30 minutes (instead of the normal 5 minute increment). So, IT Staff would be alerted after 5 minutes, but the help desk would get a 30 minute buffer for the problem to work out before opening a ticket.
I tried setting up a group service that included all hosts (for this test, I used disk usage check). I copied an already made disk usage check in CCM and edited it to include all servers in the new service check. Then I added the help desk as the only contact and specified to not send notifications after the first notification. I activated the service, then applied configuration. Now all of my disk usage services are coming up as "SERVICE ALERT: server;/ Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 30 seconds."
Does Nagios XI not allow for separate services checking the same thing?
The problem I'm having is that sometimes servers have a small problem that last only a few minutes and then resolve themselves. Say a network problem happens over a five minute period in the middle of the night and resolves, but now Nagios has sent out 60 help desk tickets for the problem. That's a lot of overhead.
What I thought of doing was setting up separate service checks just for the Help desk that would check 5 times over 30 minutes (instead of the normal 5 minute increment). So, IT Staff would be alerted after 5 minutes, but the help desk would get a 30 minute buffer for the problem to work out before opening a ticket.
I tried setting up a group service that included all hosts (for this test, I used disk usage check). I copied an already made disk usage check in CCM and edited it to include all servers in the new service check. Then I added the help desk as the only contact and specified to not send notifications after the first notification. I activated the service, then applied configuration. Now all of my disk usage services are coming up as "SERVICE ALERT: server;/ Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 30 seconds."
Does Nagios XI not allow for separate services checking the same thing?