Scheduling host and service checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Satyam
Posts: 63
Joined: Mon Oct 24, 2011 8:14 am

Scheduling host and service checks

Post by Satyam »

Hi,

I am quite old to Nagios. But still i want to clear one thing as i am getting confused on this topic.

For example,consider I have a host with 7 services configured for it. If the host goes down, the down alert will be raised on the host check only. Till the host check is happening the service checks may happen and trigger alerts. Which in-turn results in 8 alerts/tickets for a single host down issue.

1. I think this can be fixed if the host checks happens first and the respective service checks for the host should complete, then the next host check should be initiated. This may avoid unnecessary alerts/tickets.

(OR)

2. Please help me out in configuring any service dependency kind if think, so that when ever a service check is initiated the respective host check should also happen and if the host is down the alert/ticket should not be registered for the service.

Thanks in Advance.

Regards,
Sattanathan.S
Open Source Tools Team, IMS, Tech Mahindra
Electronics City, Bangalore, India
http://in.linkedin.com/pub/sattanathan- ... 49/103/295
Nagios Certified Professional
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Scheduling host and service checks

Post by abrist »

This issue is related to the asynchronous nature of scheduled checks in nagios. If a host goes down and certain service checks are scheduled before the next host check, you will receive alerts for those services. You may want to decease the retry/intervals for the host check and verify that you are not getting sent alerts for warnings. Ideally, you do not want the services to be checked at smaller interval/retry than the host check itself, so that when the host changes to a down state it will happen before any services are configured for alerts.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Satyam
Posts: 63
Joined: Mon Oct 24, 2011 8:14 am

Re: Scheduling host and service checks

Post by Satyam »

Thanks Abrist..

But still, Please let me know why nagios functionality is designed like this? Is there any other way i can achieve the solution for my requirement "Host check should happen first and then its respective service checks and then the next host checks and its respective service checks and so on..."

Thanks in advance.
Sattanathan.S
User avatar
lmiltchev
Former Nagios Staff
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Scheduling host and service checks

Post by lmiltchev »

Even if you checked the host first, there is no guarantee that in a very short time after the check (for example, a millisecond) the host would not go down... Nagios will not notify on soft state. Avoiding "false" alerts is done by the use of retry interval and max check attempts. You need to tweak these in order to customize nagios for your environment.
Host - check interval

This directive is used to define the number of "time units" between regularly scheduled checks of the host. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.

Parameter name: check_interval
Host - retry interval

This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.

Parameter name: retry_interval
Host - max check attempts

This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check again. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the check_command option blank.

Parameter name: max_check_attempts
Be sure to check out our Knowledgebase for helpful articles and solutions!
Satyam
Posts: 63
Joined: Mon Oct 24, 2011 8:14 am

Re: Scheduling host and service checks

Post by Satyam »

Thanks lmiltchev.

I am aware of what you have posted. Let me explain my concern with the below example ...

1. Consider a host is down which has 15 service check associated with it.

2. Host check has happened 2 minutes back and the host was UP during the last poll. Polling interval for the host is 10 min. Hence next polling will happen 8 min later from now on..

3. Almost 7 service checks that belongs to the particular host happens before the next polling for host and throws critical alert like"connection refused or so..".

4. After the completion of these 7 service checks, then host checks happen and trigger the host down alert.

5. so for remaining 8 services i will suppress the ticket by configuring dependencies, but for earlier 7 services, critical alerts will be raised which is of no use."

So i was thinking to do any kind of tweaking like "Host check should happen first and then its respective service checks and then the next host checks and its respective service checks and so on..." to avoid such false alerts. Please help me out on this concern. Can this be achieved using Nagios ?

Thanks in advance..

Regards,
Sattanathan.S
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Scheduling host and service checks

Post by abrist »

As we have stated before, the checks are asynchronous and scheduled. If you properly configure the interval lengths and retries, this scenario will not happen.

1. Do not use configure stalking or use event handlers for notifications unless you know what effects those settings will have.
2. Host checks should happen *at least* as often as service checks (check_interval).
3. Host checks should have a smaller max check attempts and/or a shorter retry interval than the host's services.

This way, the host will always enter a hard down state before the services and thus suppressing alerts until the host recovers.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Satyam
Posts: 63
Joined: Mon Oct 24, 2011 8:14 am

Re: Scheduling host and service checks

Post by Satyam »

Thanks Abrist for your support and idea.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Scheduling host and service checks

Post by slansing »

Sat,

Let us know if you need more help!
Locked