Page 1 of 2

Help with service dependencies.

Posted: Fri Feb 06, 2015 4:17 pm
by confused_IT
Hi everyone, I have what I think is a small issue that I can't figure out. I think I know what the problem is, but i'll get into that shortly. I am having trouble setting up my service dependency. Here is what i have in my dependecies.cfg file

define servicedependency{
host_name server01
service_description service_Tomcat
dependent_host_name server01
dependent_service_description service01,service02,service03,service04
execution_failure_criteria w,u,c
notification_failure_criteria w,u,c
}

The services(service01, service02 etc) are dependent on service_tomcat. They are url's. If tomcat goes down, we know that the four services are unreachable. However, it doesn't work when i stop tomcat.

So i've noticed in nagios when the services are retrying, they get to their max attempt and then send a notification. So I figured I could just reduce the attempts service_tomcat would make. However, I don't know where to make that change, as I can't find it anywhere, so I'm assuming that nagios will still treat it as a regular service. I've tried making changes in the templates file and our main nagios file, but it doesn't do anything.

I was thinking that if the services reached it's max attempts and service_tomcat was still checking, it would still at least suppress the notifications. I think there is a 'p' for the execution_failure_criteria that I am going to try now, but otherwise, i have no clue what i am missing or overlooking. thanks in advance

Re: Help with service dependencies.

Posted: Sun Feb 08, 2015 11:58 pm
by Box293
Your configuration is correct.

The problem lies in your service_Tomcat.

This service needs to enter a HARD state before the dependencies take affect.

For example if your service_Tomcat was:
check_interval 5
retry_interval 1
max_check_attempts 5

AND your other services were:
check_interval 1
retry_interval 1
max_check_attempts 5

Then it would take service_Tomcat up to 9 minutes before it enters a HARD state. During this time the other services will continue to execute.

Re: Help with service dependencies.

Posted: Mon Feb 09, 2015 12:02 pm
by confused_IT
Box293 wrote:Your configuration is correct.

The problem lies in your service_Tomcat.

This service needs to enter a HARD state before the dependencies take affect.

For example if your service_Tomcat was:
check_interval 5
retry_interval 1
max_check_attempts 5

AND your other services were:
check_interval 1
retry_interval 1
max_check_attempts 5

Then it would take service_Tomcat up to 9 minutes before it enters a HARD state. During this time the other services will continue to execute.

Thanks for the reply. I had monitored the logs to see what was happening, and I knew it had something to do with the hard/soft states, but I wasn't sure where to edit that. I actually edited the file I needed to do that, but I guess I should have set it to 1. Ill give that a try now and see what happens and let you know. thanks again

Re: Help with service dependencies.

Posted: Mon Feb 09, 2015 1:34 pm
by confused_IT
Box293 wrote:Your configuration is correct.

The problem lies in your service_Tomcat.

This service needs to enter a HARD state before the dependencies take affect.

For example if your service_Tomcat was:
check_interval 5
retry_interval 1
max_check_attempts 5

AND your other services were:
check_interval 1
retry_interval 1
max_check_attempts 5

Then it would take service_Tomcat up to 9 minutes before it enters a HARD state. During this time the other services will continue to execute.

I'm actually having a bit difficulty on where to make the change. Well, I think i do know where to make the change, but I think there is some conflict. I tried this:

extra_service_conf["max_check_attempts"] = [
( "5", ALL_HOSTS, ALL_SERVICES ),
( "3", ["server01"], ["service_Tomcat"] )
]

and while nagios successfully compiles nagios or what not, it doesn't actually update to '2' for server01

Re: Help with service dependencies.

Posted: Mon Feb 09, 2015 6:17 pm
by Box293
The setting for service_Tomcat need to be defined specifically in the service definition.

For example (this is taken from a Nagios XI box but the settings are the same):

Code: Select all

define service {
        host_name                       10.25.14.2
        service_description             CPU Usage
        use                             xiwizard_windowswmi_service
        check_command                   check_xi_service_wmiplus!'yyyyy'!'xxxxx'!checkcpu!-w '80' -c '90'!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_interval           60
        notification_period             xi_timeperiod_24x7
        contacts                        nagiosadmin
        notes_url                       http://notes.com
        _xiwizard                       windowswmi
        register                        1
        }
You want to change these to the values you want.

Code: Select all

        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
extra_service_conf appears to be a check_mk thing, which you will need to go to the check_mk support forums to get answers on this.

Re: Help with service dependencies.

Posted: Tue Feb 10, 2015 11:19 am
by confused_IT
Box293 wrote:The setting for service_Tomcat need to be defined specifically in the service definition.

For example (this is taken from a Nagios XI box but the settings are the same):

Code: Select all

define service {
        host_name                       10.25.14.2
        service_description             CPU Usage
        use                             xiwizard_windowswmi_service
        check_command                   check_xi_service_wmiplus!'yyyyy'!'xxxxx'!checkcpu!-w '80' -c '90'!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_interval           60
        notification_period             xi_timeperiod_24x7
        contacts                        nagiosadmin
        notes_url                       http://notes.com
        _xiwizard                       windowswmi
        register                        1
        }
You want to change these to the values you want.

Code: Select all

        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
extra_service_conf appears to be a check_mk thing, which you will need to go to the check_mk support forums to get answers on this.
Yea I've tried editing that file that contains the 'define service...' but main.mk will overwrite whatever i did there and replace what the main.mk has. I will post in the check_mk forum to see if they can help. thanks

Re: Help with service dependencies.

Posted: Wed Feb 11, 2015 2:57 pm
by tgriep
Thanks, keep us informed on the status.

Re: Help with service dependencies.

Posted: Thu Feb 12, 2015 12:48 pm
by confused_IT
tgriep wrote:Thanks, keep us informed on the status.
Hi, could you redirect me to the check_mk support forums? I took a look at the all the forums, and i am unsure which one to post in.
This task has been reduced to low priority since it is taking a lot of time, and it isn't essential, but i know i'm really close to getting this fixed as i spent many hours on this. Thanks

Re: Help with service dependencies.

Posted: Thu Feb 12, 2015 5:00 pm
by Box293
It looks like you're going to need to join a mailing list:

http://mathias-kettner.com/check_mk_lists.html

Re: Help with service dependencies.

Posted: Wed Mar 04, 2015 11:40 am
by confused_IT
confused_IT wrote:
Box293 wrote:Your configuration is correct.

The problem lies in your service_Tomcat.

This service needs to enter a HARD state before the dependencies take affect.

For example if your service_Tomcat was:
check_interval 5
retry_interval 1
max_check_attempts 5

AND your other services were:
check_interval 1
retry_interval 1
max_check_attempts 5

Then it would take service_Tomcat up to 9 minutes before it enters a HARD state. During this time the other services will continue to execute.

I'm actually having a bit difficulty on where to make the change. Well, I think i do know where to make the change, but I think there is some conflict. I tried this:

extra_service_conf["max_check_attempts"] = [
( "5", ALL_HOSTS, ALL_SERVICES ),
( "3", ["server01"], ["service_Tomcat"] )
]

and while nagios successfully compiles nagios or what not, it doesn't actually update to '2' for server01
I got this figured out as seen with the config below

extra_service_conf["max_check_attempts"] = [
( "2", ["server01"], ["service_Tomcat"] ),
( "3", ALL_HOSTS, ALL_SERVICES )
]

extra_service_conf["normal_check_interval"] = [
( "1", ["server01"], [ "service_Tomcat"] ),
( "5", ALL_HOSTS, ALL_SERVICES )
]

extra_service_conf["retry_check_interval"] = [
( "1", ["server01"], ["service_Tomcat"] ),
( "2", ALL_HOSTS, ALL_SERVICES )
]

The mailing list was able to help me with that, and i can see the max check changed in the value in the web gui of nagios. however, it still seems that the services will still manage to hit the max retries with the config. I'm lost now because the config looks right to me. The mailing list no longer is helping, so I'm hoping someone hear may be able to point me in the right direction