Page 1 of 3

Interesting Situation...

Posted: Mon Feb 22, 2016 12:59 pm
by rkymtnhigh
Here is what is happening:

We have 2 external to our network Nagios servers that run perl scripts to check that our service is operational.

One is hosted with AWS, another with Digital Ocean.

From time to time, one install will show that the service is DOWN, but the other will remain UP.
It seems that both locations exhibit the same symptoms at random times.

We obviously need this service to let us know if we go DOWN, but we don't want to be falsely alerted (or woken up) if it is an issue isolated to ONE hosting provider.
We have reached out to both services, but they are unhelpful in identifying any issues on their end.

So for now, we have one install muted (the most recently problematic install) and only get alerts from the "longest-running service uptime" Nagios server.
However, then that server will have issues too. Then we switch. This is not a great solution going forward, as we still get alerted when it seems that it is only an issue isolated to ONE hosting provider.

My question is, does anyone know of a way to "link" the 2 nagios installs to make them aware of each other, and possibly ONLY alert when BOTH service checks are down?

Thank you so much, this has been quite the headache for our Operations team!

RMH

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 1:43 pm
by rkennedy
You should be able to use an agent on both of the XI machines (one on DO, one on AWS) (NRPE). From there, you'll want to setup one or both machines to run check_nrpe.

Using check_nrpe you should be able to run a remote check from each machine, and then use BPI to group these checks. Here's an example of the checks which would run on both machines using check_http and let's call your LAN nagios.com. This is only an outline, you'll still need to setup a check_http command with NRPE -

AWS:
check_http nagios.com
check_nrpe -H DO -c check_http

DO:
check_http nagios.com
check_nrpe -H AWS -c check_http

Now, using the BPI wizard you should be able to accomplish this. You can have both machines notify only if both services are down.

Will this work for you?

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 1:56 pm
by rkymtnhigh
Thank you! That does sound like it should work.

Just to clarify a couple things, you are suggesting installing the nagios client / NRPE on both CentOS Nagios boxes?

Something like this: http://ithelpblog.com/os/linux/redhat/c ... -6-3-rhel/

Then set a check on DO to check AWS's service availability? (And vice versa)

Then set up the nagios boxes as hosts on the other nagios boxes? Lets say our appcheck is check_app. Add a custom "check_remote_app" command that uses check_nrpe and runs the check_app on the remote host?

Once that is working, then use BPI wizard to somehow link the two? I don't have any experience using that tool, but can probably figure it out.

Thank you,

RMH

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 2:05 pm
by rkymtnhigh
Well, I didn't get too far. When trying to install NRPE and the client I get this:

Code: Select all

--> Processing Conflict: nagiosxi-deps-5.2.3-1.noarch conflicts nagios-nrpe
--> Processing Conflict: nagiosxi-deps-5.2.3-1.noarch conflicts nrpe

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 2:34 pm
by rkennedy
You got it. Do you perhaps have NRPE installed already?

What command were you running to produce the conflict?

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 3:24 pm
by rkymtnhigh
It says NRPE is available, but not installed. Something along those lines.

I am running the

Code: Select all

yum install nagios-nrpe nagios-devel
When I try to start service start nrpe, it gives me unrecognized service.

Thank you.

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 4:48 pm
by rkennedy
Give this document a look for information on installing NRPE. https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Were you following a certain set of instructions that said to install those packages?

EDIT: ^ just saw that was in the link you posted. It may be because XI is running on the current machine.
EDIT2: Can you post the result of a yum repolist?

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 4:55 pm
by lmiltchev
Most probably NRPE is running under xinetd on these machines. Do you have the following file on either of these boxes - "/etc/xinetd.d/nrpe"? What is the output of the following commands?

Code: Select all

service xinetd restart
netstat -an | grep 5666
/usr/local/nagios/libexec/check_nrpe -H localhost

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 5:22 pm
by rkennedy
@lmiltchev is right - NRPE is installed on the Nagios XI machine by default (didn't realize this). If the word 'localhost' does not work, try 127.0.0.1.

You'll need to modify the /etc/xinetd.d/nrpe file, specifically the only_from to allow the reciprocals to allow access between the two.

Re: Interesting Situation...

Posted: Mon Feb 22, 2016 6:06 pm
by rkymtnhigh
Looks like it is installed under xinetd.
I see that port listening, NRPE v2.14

Now I just need to set up my checks and get everything configured. Will report back.

Thank you!