Service checks when host is down
-
[email protected]
- Posts: 19
- Joined: Tue Feb 07, 2012 7:29 am
Service checks when host is down
I am trying to tidyup our Nagios view of hosts/devices by reducing the number services marked as critical .
When a host is down, is it possible to selectively tell Nagios not to test the associated service(s) (and mark it as unknown).
I say selectively, as I have one service which is a Wake on Lan service ... ie it tests for ping , if fail, it sends a wol packet, thus this wopuld still operate when the host is down. It tries 3 times over about 10m then fails.
Looking forward to your response.
Thanks
Liam
When a host is down, is it possible to selectively tell Nagios not to test the associated service(s) (and mark it as unknown).
I say selectively, as I have one service which is a Wake on Lan service ... ie it tests for ping , if fail, it sends a wol packet, thus this wopuld still operate when the host is down. It tries 3 times over about 10m then fails.
Looking forward to your response.
Thanks
Liam
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service checks when host is down
I think this is the document you are looking for
http://nagios.sourceforge.net/docs/3_0/ ... ncies.html
http://nagios.sourceforge.net/docs/3_0/ ... ncies.html
-
[email protected]
- Posts: 19
- Joined: Tue Feb 07, 2012 7:29 am
Re: Service checks when host is down
Thanks Scott,
The task of adding inter-dependencies to 2000 disparate devices is very daunting ... I was looking for a more general switch.
ie In most cases if the host is down, service checks are unecessary, wasteful of resources, and cause excessive cluttter on nagios monitors.
Any process which actively cleans up the view to enable support to home in on the actual faults/problems would be a good thing in my book.
Is there any way either in current product or as a possible enhancement, that the services would auto return 'Unknown' if the host was down.?
Thanks
Liam
The task of adding inter-dependencies to 2000 disparate devices is very daunting ... I was looking for a more general switch.
ie In most cases if the host is down, service checks are unecessary, wasteful of resources, and cause excessive cluttter on nagios monitors.
Any process which actively cleans up the view to enable support to home in on the actual faults/problems would be a good thing in my book.
Is there any way either in current product or as a possible enhancement, that the services would auto return 'Unknown' if the host was down.?
Thanks
Liam
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Service checks when host is down
They could return unreachable if a dependency relationship was set up which is close to what Scott was getting at I believe. Please see the following doc's:
http://nagios.sourceforge.net/docs/3_0/ ... ility.html
You could use this same method to create a false buffer between the hosts and services, by creating a hierarchical format between Host > Host > Service, Host "could go down" > Then Host "becomes unreachable" > And Service "becomes unreachable." This is how network reachability works within Nagios, though it would take some tuning based on how your architecture is set up... You could use event handlers for this as well.
http://nagios.sourceforge.net/docs/3_0/ ... ility.html
You could use this same method to create a false buffer between the hosts and services, by creating a hierarchical format between Host > Host > Service, Host "could go down" > Then Host "becomes unreachable" > And Service "becomes unreachable." This is how network reachability works within Nagios, though it would take some tuning based on how your architecture is set up... You could use event handlers for this as well.
-
[email protected]
- Posts: 19
- Joined: Tue Feb 07, 2012 7:29 am
Re: Service checks when host is down
Sounds good.
I am unclear how I should create these relationships.
Could you define how I would go about this ... by way of referring to my example below ...
I have 70 windows podia in lecture rooms.
Podia are all in one group called 'windows-podia'
6 services are setup and operate against the 'windows-podia' group.
if a podium goes down, all the services go red. (ie fail), I would prefer they returned unknown or unreachable.
typical host definition
define host {
use windows-podium ;
host_name BUS-1025-26_Lab_Podium;
check_command check-host-alive;
alias Lab_Podium ;
address 192.168.91.96 ;
hostgroups windows-podiaIPs;
}
Each service is defined similar to the following;
define service {
use generic-service
service_description CPU Detail
check_command check-wsc!cpu_detail!80%,90%
hostgroups windows-podiaIPs
}
and of course a group definition
define hostgroup{
hostgroup_name windows-podiaIPs ; The name of the hostgroup
alias Windows Podia Desktops ; Long name of the group
}
Looking forward to your reply
Liam
I am unclear how I should create these relationships.
Could you define how I would go about this ... by way of referring to my example below ...
I have 70 windows podia in lecture rooms.
Podia are all in one group called 'windows-podia'
6 services are setup and operate against the 'windows-podia' group.
if a podium goes down, all the services go red. (ie fail), I would prefer they returned unknown or unreachable.
typical host definition
define host {
use windows-podium ;
host_name BUS-1025-26_Lab_Podium;
check_command check-host-alive;
alias Lab_Podium ;
address 192.168.91.96 ;
hostgroups windows-podiaIPs;
}
Each service is defined similar to the following;
define service {
use generic-service
service_description CPU Detail
check_command check-wsc!cpu_detail!80%,90%
hostgroups windows-podiaIPs
}
and of course a group definition
define hostgroup{
hostgroup_name windows-podiaIPs ; The name of the hostgroup
alias Windows Podia Desktops ; Long name of the group
}
Looking forward to your reply
Liam
Re: Service checks when host is down
Slansing's method would only work if all the podiums were children of a parent networking device, but would only label the podiums as "UNREACHABLE" when the parent networking device was down.
Service dependencies may be right tool for the job, though I do understand that making those changes is a giant task.
http://monitoringtt.blogspot.com/2011/0 ... -host.html
Service dependencies may be right tool for the job, though I do understand that making those changes is a giant task.
Beyond service dependencies, you could use event handlers to turn checks on and off depending on host state, although this is probably just as much of a time sink to implement as service dependencies:[email protected] wrote:Is there any way either in current product or as a possible enhancement, that the services would auto return 'Unknown' if the host was down.?
http://monitoringtt.blogspot.com/2011/0 ... -host.html
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Service checks when host is down
As far as I understand the documentation, that task can hardly be done using service dependencies by two reasons:
[*]One service cannot be dependent on one host, just on one (or more) service(s) and what we're talking about is making one or more services dependent on their host
[*]You could bypass the previous fact by creating a service whose status was the same of its host (using check_dummy $HOSTSTATUS$ as service check). However, and again based on documentation, you migth define one by one a service dependency rule between that service and all their "brothers" (the rest of services associated to the host).
I believe that the solution explained http://monitoringtt.blogspot.com/2011/0 ... -host.html is the easiest one, moreover if you configure that handler as global_service_event_handler in order to be used by all your services. Even you can program that event handler script to check if a given inhibition user macro (say $DISCARD_HOSTSTATUS$ ) exists on the service or host in order to avoid running it for centain special objects.
[*]One service cannot be dependent on one host, just on one (or more) service(s) and what we're talking about is making one or more services dependent on their host
[*]You could bypass the previous fact by creating a service whose status was the same of its host (using check_dummy $HOSTSTATUS$ as service check). However, and again based on documentation, you migth define one by one a service dependency rule between that service and all their "brothers" (the rest of services associated to the host).
I believe that the solution explained http://monitoringtt.blogspot.com/2011/0 ... -host.html is the easiest one, moreover if you configure that handler as global_service_event_handler in order to be used by all your services. Even you can program that event handler script to check if a given inhibition user macro (say $DISCARD_HOSTSTATUS$ ) exists on the service or host in order to avoid running it for centain special objects.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Service checks when host is down
Have you decided on or found a solution [email protected]?
-
[email protected]
- Posts: 19
- Joined: Tue Feb 07, 2012 7:29 am
Re: Service checks when host is down
As suggested I raised an enhancement request...
[*]One service cannot be dependent on one host, just on one (or more) service(s) and what we're talking about is making one or more services dependent on their host.
All the services I test with a given host are very specific to that host (ie most are wmi checks using check_wsc as the engine), memory utilisation, processor utilisation, service running, processes running, etc. It makes sence for us to consider services dependant on their host. In fact the vast majority of all the tests we have configured are of this type. The only tests outside of this are a small number of tests on a Microsoft Cluster, DHCP,DNS and AD Domain servers, which I agree would fall under this model.
Does this mean that a different service model or a modification of the current service model is required to satisfy these needs?
The model described in http://nagios.sourceforge.net/docs/3_0/ ... ncies.html is more complex than we use, our needs are much more simpler.
In fact, are the above checks I have listed considered 'proper' services under this model?, hence, is the current model suitable for what I am trying to achieve?
The only way I could foresee to make this happen in a large network, would be to have dynamic discovery mechanism which would create all the parent / child relationships and update the config files accordingly. I cant see this happening any time soon.
[*]You could bypass the previous fact by creating a service whose status was the same of its host (using check_dummy $HOSTSTATUS$ as service check). However, and again based on documentation, you migth define one by one a service dependency rule between that service and all their "brothers" (the rest of services associated to the host).
I found the link http://nagios.sourceforge.net/docs/3_0/ ... ncies.html confusing to follow for what I was trying to achieve. Using the following host and services below how would I code these to fit your solution? The objective; to force all services to unknown state if host is down.
define host {
use windows-podium ;
host_name BUS-1025-26_Lab_Podium;
check_command check-host-alive;
alias Lab_Podium ;
address 192.168.91.96 ;
hostgroups windows-podiaIPs;
}
# dummy service as suggested
define service {
use generic-service
check_command check_dummy $HOSTSTATUS$
hostgroups windows-podiaIPs
}
# example service.
define service {
use generic-service
service_description CPU Detail
check_command check-wsc!cpu_detail!80%,90%
hostgroups windows-podiaIPs
}
... What happens to the dummy service ... what state will it be in if the host is down?
Regards
Liam
[*]One service cannot be dependent on one host, just on one (or more) service(s) and what we're talking about is making one or more services dependent on their host.
All the services I test with a given host are very specific to that host (ie most are wmi checks using check_wsc as the engine), memory utilisation, processor utilisation, service running, processes running, etc. It makes sence for us to consider services dependant on their host. In fact the vast majority of all the tests we have configured are of this type. The only tests outside of this are a small number of tests on a Microsoft Cluster, DHCP,DNS and AD Domain servers, which I agree would fall under this model.
Does this mean that a different service model or a modification of the current service model is required to satisfy these needs?
The model described in http://nagios.sourceforge.net/docs/3_0/ ... ncies.html is more complex than we use, our needs are much more simpler.
In fact, are the above checks I have listed considered 'proper' services under this model?, hence, is the current model suitable for what I am trying to achieve?
The only way I could foresee to make this happen in a large network, would be to have dynamic discovery mechanism which would create all the parent / child relationships and update the config files accordingly. I cant see this happening any time soon.
[*]You could bypass the previous fact by creating a service whose status was the same of its host (using check_dummy $HOSTSTATUS$ as service check). However, and again based on documentation, you migth define one by one a service dependency rule between that service and all their "brothers" (the rest of services associated to the host).
I found the link http://nagios.sourceforge.net/docs/3_0/ ... ncies.html confusing to follow for what I was trying to achieve. Using the following host and services below how would I code these to fit your solution? The objective; to force all services to unknown state if host is down.
define host {
use windows-podium ;
host_name BUS-1025-26_Lab_Podium;
check_command check-host-alive;
alias Lab_Podium ;
address 192.168.91.96 ;
hostgroups windows-podiaIPs;
}
# dummy service as suggested
define service {
use generic-service
check_command check_dummy $HOSTSTATUS$
hostgroups windows-podiaIPs
}
# example service.
define service {
use generic-service
service_description CPU Detail
check_command check-wsc!cpu_detail!80%,90%
hostgroups windows-podiaIPs
}
... What happens to the dummy service ... what state will it be in if the host is down?
Regards
Liam
Re: Service checks when host is down
The service dependency for this setup would be as follows (note: I had to give a service_description for the dummy service):[email protected] wrote:
define host {
use windows-podium ;
host_name BUS-1025-26_Lab_Podium;
check_command check-host-alive;
alias Lab_Podium ;
address 192.168.91.96 ;
hostgroups windows-podiaIPs;
}
# dummy service as suggested
define service {
use generic-service
service_description dummy-checker
check_command check_dummy $HOSTSTATUS$
hostgroups windows-podiaIPs
}
# example service.
define service {
use generic-service
service_description CPU Detail
check_command check-wsc!cpu_detail!80%,90%
hostgroups windows-podiaIPs
}
Code: Select all
define servicedependency{
host_name BUS-1025-26_Lab_Podium
service_description dummy-checker
dependent_service_description CPU Detail
execution_failure_criteria w,u,c
notification_failure_criteria w,u,c
}It will reflect the $HOSTSTATUS$ macro, i.e., it will be DOWN.[email protected] wrote: ... What happens to the dummy service ... what state will it be in if the host is down?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.