Page 1 of 2

Taking server out of service while it's already marked DOWN

Posted: Fri Aug 07, 2015 1:54 pm
by gossamer
Hi,

I have a nagios-2.0.1 system on fedora21 and have a server down at the colo that I won't be able to attend to until at least three days from now. It's constantly alerting me that it's down, but nothing I seem to be able to do will tell nagios to stop monitoring it until I can fix it.

Is there a way to take it out of service until it's back online after it's already down?

Re: Taking server out of service while it's already marked D

Posted: Fri Aug 07, 2015 2:03 pm
by ssax
Have you tried scheduling downtime for the host or have you clicked the disable notifications on the host? Are you getting any errors when doing that?

If you edit the config file of the host and services you should be able to specify notifications_enabled 0 and restart the nagios service.

Worst case scenario you could set notifications_enabled=0 in your /usr/local/nagios/etc/nagios.cfg file and restart the nagios service to stop notifications.

Re: Taking server out of service while it's already marked D

Posted: Fri Aug 07, 2015 2:14 pm
by gossamer
ssax wrote:Have you tried scheduling downtime for the host or have you clicked the disable notifications on the host? Are you getting any errors when doing that?

If you edit the config file of the host and services you should be able to specify notifications_enabled 0 and restart the nagios service.

Worst case scenario you could set notifications_enabled=0 in your /usr/local/nagios/etc/nagios.cfg file and restart the nagios service to stop notifications.
Maybe I haven't set it up right. I have a single services.cfg which defines all the services for all the hosts being monitored. At the top of the file, I have a "define service" with a "name standard-service-24x7" which is the only "define service" that has a name. It is here where notifications_enabled is set. Then each service for each host looks similar to this:

define service {
use standard-service-24x7
host_name pixie
service_description NTP
check_command check_nrpe!check_ntp
}

repeating for each host and each service on each host. So, I would only be able to disable all notifications to all hosts if I were to set notifications_enabled to 0.

I suppose I could add that to each service, but that's a lot of work...

Re: Taking server out of service while it's already marked D

Posted: Sun Aug 09, 2015 10:09 am
by jdalrymple
You can apply the "notifications_enabled" directive in the very lowest layer of your service definition and it will only apply to the service/host combination there. It will not affect stuff in templates upstream. If the definition you showed applied that service to multiple hosts it could be an issue, that doesn't appear to be the case though.

Actually you have stuff set up quite nicely, not wrong at all.

Does what I described make sense?

Re: Taking server out of service while it's already marked D

Posted: Mon Aug 10, 2015 9:32 am
by gossamer
jdalrymple wrote:You can apply the "notifications_enabled" directive in the very lowest layer of your service definition and it will only apply to the service/host combination there. It will not affect stuff in templates upstream. If the definition you showed applied that service to multiple hosts it could be an issue, that doesn't appear to be the case though.

Actually you have stuff set up quite nicely, not wrong at all.

Does what I described make sense?
Yes, I believe it makes sense, but changing it for every service is quite laborious. Is there no shorter/faster way to do it?

Is this the equivalent of setting "Disable notifications for all services on this host" from within the web front-end?

Re: Taking server out of service while it's already marked D

Posted: Mon Aug 10, 2015 10:02 am
by jdalrymple
gossamer wrote:and have a server down at the colo that I won't be able to attend to until at least three days from now
Shouldn't take long for "a server"?

Also, the generally accepted way of dealing with this is to just disable notifications in the CGI or put the server into a downtime. Why not use one of those options (suggested above by ssax)?

Re: Taking server out of service while it's already marked D

Posted: Fri Aug 14, 2015 11:41 am
by gossamer
jdalrymple wrote:
gossamer wrote:and have a server down at the colo that I won't be able to attend to until at least three days from now
Shouldn't take long for "a server"?

Also, the generally accepted way of dealing with this is to just disable notifications in the CGI or put the server into a downtime. Why not use one of those options (suggested above by ssax)?
Thanks for your help, guys. I recall trying to use the "Schedule Downtime" option after the server was already down, and it continuing to alert me the server was down. I guess I'll just have to play with the combination of options until I find the ones that work.

Re: Taking server out of service while it's already marked D

Posted: Fri Aug 14, 2015 1:00 pm
by tgriep
Were you receiving Host notifications or Service Notifications, or both?
When you schedule downtime, you will have to schedule downtime for host and services separately.

Re: Taking server out of service while it's already marked D

Posted: Fri Aug 14, 2015 5:23 pm
by gossamer
tgriep wrote:Were you receiving Host notifications or Service Notifications, or both?
When you schedule downtime, you will have to schedule downtime for host and services separately.
Good to know. Thanks so much for the info.

Re: Taking server out of service while it's already marked D

Posted: Mon Aug 17, 2015 9:19 am
by hsmith
gossamer wrote:
tgriep wrote:Were you receiving Host notifications or Service Notifications, or both?
When you schedule downtime, you will have to schedule downtime for host and services separately.
Good to know. Thanks so much for the info.
Is there anything else we can do to help you, or is this one all right to close?