Turn off alerts for "no data received from host"

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
bowmant
Posts: 20
Joined: Thu Jul 17, 2014 10:08 am

Turn off alerts for "no data received from host"

Post by bowmant »

I thought I had this configured correctly but I guess not. Once in a while, a check is missed/times out for whatever reason and we get notifications emailed like the following. I've tried to find where to turn that off but apparently I've missed it somewhere. Can someone tell me all the templates or whatnot to check to make sure these don't notify us for just a single missed check?

Thanks,

***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: Drive C: Disk Usage
Host: myserver.mydomain.com
Address: 10.10.10.10
State: WARNING
Info:
No data was received from host!
Date/Time: 2014-09-23 21:12:02
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Turn off alerts for "no data received from host"

Post by slansing »

Well, you can either edit the service/host alert settings to remove certain state changes from alerting you (bad idea), or you can set up a retry interval that requires the host/service to be checked a certain amount of times before sending an alert notification. That would be done through the CCM on the host/service/template in the Check Settings and Alert Settings pages. You can click the Documentation dropdown box in there to get a better idea of what those options do.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Turn off alerts for "no data received from host"

Post by Box293 »

You can change this using the intervals.

If your max check attempts is 1 then as soon as you get a problem it will alert.

If your max check attempts is 2 then it will check one more time in X minutes depending what your retry interval is set to AND then alert.

If your max check attempts is 3 then it will check two more times in X minutes depending what your retry interval is set to AND then alert.

Does this make sense?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jwelch
Posts: 225
Joined: Wed Sep 05, 2012 12:49 pm

Re: Turn off alerts for "no data received from host"

Post by jwelch »

I would be tempted to modify the check to return an UNKNOWN state instead of WARNING (more accurate in my opinion) for the 'no data received' result and adjusting the config not to send notifications for UNKNOWN states for this host/service. That way you would still be able to get WARNING or CRITICAL notifications without adding additional checks/delays beyond what you already have configured.
If you really do have the check configured to send notifications immediately with no retries or notification delay you will probably have to get used to some glitch notifications. I think the default for most checks is to check once every 5 minutes, and if there is a problem, check once a minute up to 5 times before being considered a hard state. You can also configure a notification delay for some period (10 minutes, 30 minutes, etc...) before notifications are sent to give you time to fix the problem or acknowledge it before notifications are sent.
bowmant
Posts: 20
Joined: Thu Jul 17, 2014 10:08 am

Re: Turn off alerts for "no data received from host"

Post by bowmant »

For this service, the Check Settings checks/retries/max are set to 5/1/2. The Alert Settings are set to no delay. On the Common Settings tab, I see this host is listed on the "Managed Hosts" page, and the "xiwizard_windowsserver_nsclient_service" is listed on the "Managed Templates" page. On that service, I see the check settings are blank and the notification options don't include 'U' and 'S'.

I think jwelch is onto something, this is an actual warning condition, not an unknown like I thought it was.

That said, what is the easiest way to cut down on these alerts for the big picture? If I edit the xiwizard_windowsserver_nsclient_service template to put the Check Settings to 5/1/3, that should give me 1 more retry before it sends a notification, right? And then that would apply to all services configured by that template?

Or is changing the check to return an unknown instead of a warning easy to do? If so, where do I do that so it affects all my disk checks at once instead of editing each individual service item?

Thanks
jwelch
Posts: 225
Joined: Wed Sep 05, 2012 12:49 pm

Re: Turn off alerts for "no data received from host"

Post by jwelch »

Probably your best bet is to change the settings first. Changing the check itself may not be simple. First you have to find out what is being executed and see if you can modify the script. I've noticed that some of the check scripts supplied with XI are obfuscated in some fashion so I can't edit them directly. (compiled, pre-compiled...I don't know...I just know I can't see code when I vi them)
In those cases I either live with it, find a substitute, or write my own check. If you can edit the script, look for the section that handles the timeout and change the code to return unknown (return code 3?) instead of warning (return code 1) and modify the text associated. Could be something as simple as
...
print "WARNING- no data received from host\n";
exit 1;
...

or
...
my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);
...
$result='WARNING';
$msg="no data received from host\n";
...
print "$result - $msg\n";
exit $ERRORS{$result};

or could be much more complicated. In any case I thought it was worth mentioning.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Turn off alerts for "no data received from host"

Post by lmiltchev »

bowmant, let us know if jwelch's suggestion helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
bowmant
Posts: 20
Joined: Thu Jul 17, 2014 10:08 am

Re: Turn off alerts for "no data received from host"

Post by bowmant »

I really like jwelch's solution but editing a script is above my head if it involves anything more than vi and simple variable or text substitutions.

So I guess my questions still stand:
If I edit the xiwizard_windowsserver_nsclient_service template to put the Check Settings to 5/1/3, that should give me 1 more retry before it sends a notification, right? And then that would apply to all services configured by that template?
Is that right, or is there a better way to do it?
jwelch
Posts: 225
Joined: Wed Sep 05, 2012 12:49 pm

Re: Turn off alerts for "no data received from host"

Post by jwelch »

>Check Settings to 5/1/3, that should give me 1 more retry before it sends a notification, right?

Sounds right to me, but then it's easy to verify. Just override the template for this service on one server by modifying it's configuration in the CCM. I'd set it to 5/1/5, then set the threshold to some value that will cause the check to fail, apply the config and sit back and wait for a few minutes and see what happens. You can watch it in the GUI, the 'Attempt' column should increment every minute from 1/5 to 5/5, then send a notification.
You can click on the service to get to the service detail page, then click on the service history icon (looks like a scroll) to see the attempts after the fact. (don't forget to change the 'State Types' dropdown from 'HARD' to 'BOTH' and hit UPDATE to see the retries and the hard fail. That will give you timestamps of when the checks were made. You can look at the notification email to see when it was sent or go to the 'Incident Management->Notifications' report on the left side of the Home page to see when the notifications were sent.
bowmant
Posts: 20
Joined: Thu Jul 17, 2014 10:08 am

Re: Turn off alerts for "no data received from host"

Post by bowmant »

OK, this can be marked closed. I changed the service template to be 3 max checks. That didn't change any of the existing ones, so I'm not sure if that accomplished anything. Then I used the bulk mod tool to change all my existing services to be 3. I always forget about the bulk mods, it's very handy.

Overall, I think changing the check for no response to return unknown instead of warning would be a better solution but this is fine also.

Thanks for everyone's input.
Locked