Nagios Support Forum

Posted: **Wed Sep 17, 2014 3:22 pm**

Hi all, been using Nagios since version 2, now using Nagios Core 4.0.7. I have 4.0.8 downloaded and plan to get that upgrade installed soon.

I have a need to do something I'm not sure I can describe clearly, so hopefully someone can interpret

I have a number of hosts that I check, that are all in the hostgroup "internet" -- I'm not checking services on these per se, all I care about it whether I can reach (ping) them. But they go down from time to time, and that is not convincing evidence that my internet connection is in trouble. What would convince me would be if more than 80% of the hosts in the group became unreachable.

So I need a "composite" host, or perhaps a service, one that is composed of these hosts. It should go to a warning state more than say 75% become unreachable according to the thresholds and timeouts already configured, and critical if more than 80% of them go beyond the thresholds of response time and dropped packets.

And here's why: all our notification is done by sending email to e.g. [email protected]. And if the internet connection is down, well, you get the idea.

Ideally, I'd like to manage what hosts constitute this check just by putting them in the magic hostgroup, without having to maintain some external list. We don't use any service groups at present.

What I can do, and have tested, is create a "call file" which describes an outbound telephone call, a call which dials a number and plays a recording. I drop the call file on our Asterisk (open source PBX) server and poof! It calls the support cells phones and plays the recording, stating that we think the internet connection might be in trouble and needs checking.

But I haven't figured out how to accomplish the composite check, with warning and critical thresholds.

What I did do was write a Perl script that parses out the hosts.cfg file and creates a flat file listing all hosts that are in the hostgroup specified. I had some idea of writing a check that reads the flat file and, um, that's where I run out of ideas. What does it do then? Somehow ask Nagios for that status of all those hosts as of the last check? Or check them all itself, maybe by calling plug-ins in /usr/local/nagios/etc/libexec?

I did look at the command API and wrote a few scripts top write to the Nagios command pipe-- I can do stuff like disable/enable notifications for all hosts in a hostgroup. But there's seemingly no way to use that interface to retrieve all members of a hostgroup for processing external to Nagios.

There has to be an easier way.

Thanks in advance for any help.

Posted: **Wed Sep 17, 2014 3:27 pm**

The simple answer for me to write is to say, look at bischeck (http://www.bischeck.org/) which is a major add-on to Nagios that lets you do a lot of intelligent analysis.

Now I need to go re-read your post and come up with something more immediate.

Posted: **Wed Sep 17, 2014 3:34 pm**

This is a common problem with Nagios:

You have three things that you measure, and if any two of them are not working, then there is a problem. But so long as at least two of them are working, there is no problem, and you don't care which two it is.

You can make service checks that fan out sub-checks to check all the dependent services, count the results, and then warn/critical based on how many sub-checks warned or criticaled. That's one way.

You can use dependencies to say that failure is really only an option if multiple things are in a failure state, but this means you now assign meaning to which things must be in a failure state (as opposed to any two out of three).

The other option is to use event handlers that fire off when one check fails that go and check the status of the other checks. If the event handler finds that the other checks are okay, then it does nothing. If the other checks are also in a failed state (be careful of race conditions) then you can take action.

All in all, I'd recommend either bischeck from my previous post, or the fan approach listed first, here. Let me expand upon that a bit.

I'm going to GREATLY simply things and say that you have four machines that you need to ping. If any two of them are working, you're good. If three or more are failed, then you're in a CRITICAL state. So you write a module called check_fan_ping (or whatever) that does this (in pseudocode):

Code: Select all

count=0
for host in host1, host2, host3, host4; do
  check_ping -H $host
  if [ "$?" != "0" ]; then
    count=`/bin/expr $count + 1`
  fi
done

if [ "$count" -gt "2" ]; then
  echo "We have a failure!"
  exit 2
else
  echo "We are okay."
  exit 0
fi

Now run this check and it checks your other stuff, and only if you have more than two failures does this check actually fail. Awkward to maintain, but straightforward enough.

Posted: **Wed Sep 17, 2014 4:59 pm**

Maybe I am oversimplifying this, but BPI might be what you want:

http://assets.nagios.com/downloads/nagi ... _Addon.pdf

Posted: **Wed Sep 17, 2014 5:05 pm**

Hey, cool. I never knew BPI was available for Core, as well. Thanks for the tip!

Posted: **Wed Sep 17, 2014 5:07 pm**

Yep, the core version should function identically, or nearly identically to the one in XI, they were written by the same gentleman! OP, let us know if either of these would be suitable to your needs.

Posted: **Thu Sep 18, 2014 2:26 pm**

How frustrating Nagios BPI is! I thought this instruction set was easy: http://assets.nagios.com/downloads/nagi ... _Addon.pdf

But I followed it explicitly and I can't get it to work. Well, I tried to follow it explicitly, but it doesn't explain anything about special flags that might be needed on the unzip command, so I didn't use any. I just stuck the zipfile into /usr/local/nagios/share and unzipped it.

The next instruction says to:

Code: Select all

chmod +x set_bpi_perms.sh
./set_bpi_perms.sh

One small problem: There's no such shell script ANYWHERE on my system. I done find commands and it ain't noplace. As a SWAG (scientific wild-a%% guess) I set everything in nagiosbpi and below to be owned by nagios:nagios

Then they tell you to edit constants.conf What constants.conf? Again, no such file anywhere.

I took a look at some of the php, and it doesn't look complete at all. I downloaded the zipfile from someplace else and did an MD5sum on both of them. Identical.

When I go to the web page http://server.company.org/nagios/nagiosbpi/ at first I got something about protected PHP source code. It tells me to download something from
http://www.sourceguardian.com/loaders/d ... s_m=x86_64

It says I have to add
extension=ixed.5.3.lin
to /etc/php.ini

Otherwise I get
PHP script '/usr/local/nagios/share/nagiosbpi/classes/BpGroup_class.php' is protected by SourceGuardian and requires a SourceGuardian loader 'ixed.5.3.lin' to be installed.

With the extension in place and the php.ini file modified as per instructions, I get a blank web page.

This really looks like it was intended for Nagios XI, even though I carefully followed the instructions for Nagios Core. Has anyone else got this to work?
TIA,
-T

Posted: **Thu Sep 18, 2014 2:57 pm**

We did some digging around and it looks like that script was removed, can you run the following and check the web page? This will not fix the source guardian stuff, that will have to be changed on our end:

Code: Select all

chmod 755 /path/to/bpi.conf

chown apache.nagios /path/to/bpi.conf

Posted: **Thu Sep 18, 2014 5:19 pm**

In the short-term, you may want to consider my "fan approach" that I wrote about earlier:

So you write a module called check_fan_ping (or whatever) that does this (in pseudocode):

You could set up a brute force check pretty easily for now, and then expanding into BPI later.

Posted: **Fri Sep 19, 2014 9:20 am**

All: I got a lot of good ideas from this group. Thanks very much to everyone who responded. Although I was all set to write a bunch of custom code in Perl and Bash to accomplish this, it looks like Nagios BPI will do what I need and be more maintainable.

tmcdonald: I have nagiosbpi working, although I have yet to write any checks against it, and it looks like a good solution to the problem. Thanks for the suggestion.

slansing:
I got it to work. No zip file I found anywhere on the web worked for me. I did a git clone and got a working v1.3.1 Nagios BPI. Here's what I did:

Code: Select all

git clone git://git.code.sf.net/p/nagios/nagiosbpi nagios-nagiosbpi
cd nagios-nagiosbpi
cp -R nagiosbpi /usr/local/nagios/share
cd /usr/local/nagios/share/nagiosbpi
mkdir /usr/local/nagios/share/nagiosbpi/tmp
chmod o+rx config_functions functions images tmp
chmod o+rxw tmp

I edited /usr/local/nagios/share/nagiosbpi/constants.conf as below:

Code: Select all

CONFIGFILE=  /usr/local/nagios/share/nagiosbpi/bpi.conf
CONFIGBACKUP=/usr/local/nagios/share/nagiosbpi/tmp/bpi.conf.backup

##optional xml output of BPI group data.  USE ABSOLUTE DIRECTORIES FOR ALL FILE LOCATIONS
XMLOUTPUT=/usr/local/nagios/share/nagiosbpi/tmp/bpi.xml

I did not re-enable source guardian-- I had turned it off by commenting out extension=ixed.5.3.lin in php.ini and restarting httpd. I reconfirmed it is disabled.

At some point during a prior attempt I had done chown -R nagios:nagios * in the nagiosbpi directory. This time around, all the files are still owned by root:root and everything works. In most of my Nagios directories, most of the files are owned by nagios:nagios

-T

Nagios Support Forum

Check for multiple devices (composite check)?

Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?

Re: Check for multiple devices (composite check)?