Interesting Situation...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Interesting Situation...

Post by rkymtnhigh »

Here is what is happening:

We have 2 external to our network Nagios servers that run perl scripts to check that our service is operational.

One is hosted with AWS, another with Digital Ocean.

From time to time, one install will show that the service is DOWN, but the other will remain UP.
It seems that both locations exhibit the same symptoms at random times.

We obviously need this service to let us know if we go DOWN, but we don't want to be falsely alerted (or woken up) if it is an issue isolated to ONE hosting provider.
We have reached out to both services, but they are unhelpful in identifying any issues on their end.

So for now, we have one install muted (the most recently problematic install) and only get alerts from the "longest-running service uptime" Nagios server.
However, then that server will have issues too. Then we switch. This is not a great solution going forward, as we still get alerted when it seems that it is only an issue isolated to ONE hosting provider.

My question is, does anyone know of a way to "link" the 2 nagios installs to make them aware of each other, and possibly ONLY alert when BOTH service checks are down?

Thank you so much, this has been quite the headache for our Operations team!

RMH
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Interesting Situation...

Post by rkennedy »

You should be able to use an agent on both of the XI machines (one on DO, one on AWS) (NRPE). From there, you'll want to setup one or both machines to run check_nrpe.

Using check_nrpe you should be able to run a remote check from each machine, and then use BPI to group these checks. Here's an example of the checks which would run on both machines using check_http and let's call your LAN nagios.com. This is only an outline, you'll still need to setup a check_http command with NRPE -

AWS:
check_http nagios.com
check_nrpe -H DO -c check_http

DO:
check_http nagios.com
check_nrpe -H AWS -c check_http

Now, using the BPI wizard you should be able to accomplish this. You can have both machines notify only if both services are down.

Will this work for you?
Former Nagios Employee
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Interesting Situation...

Post by rkymtnhigh »

Thank you! That does sound like it should work.

Just to clarify a couple things, you are suggesting installing the nagios client / NRPE on both CentOS Nagios boxes?

Something like this: http://ithelpblog.com/os/linux/redhat/c ... -6-3-rhel/

Then set a check on DO to check AWS's service availability? (And vice versa)

Then set up the nagios boxes as hosts on the other nagios boxes? Lets say our appcheck is check_app. Add a custom "check_remote_app" command that uses check_nrpe and runs the check_app on the remote host?

Once that is working, then use BPI wizard to somehow link the two? I don't have any experience using that tool, but can probably figure it out.

Thank you,

RMH
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Interesting Situation...

Post by rkymtnhigh »

Well, I didn't get too far. When trying to install NRPE and the client I get this:

Code: Select all

--> Processing Conflict: nagiosxi-deps-5.2.3-1.noarch conflicts nagios-nrpe
--> Processing Conflict: nagiosxi-deps-5.2.3-1.noarch conflicts nrpe
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Interesting Situation...

Post by rkennedy »

You got it. Do you perhaps have NRPE installed already?

What command were you running to produce the conflict?
Former Nagios Employee
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Interesting Situation...

Post by rkymtnhigh »

It says NRPE is available, but not installed. Something along those lines.

I am running the

Code: Select all

yum install nagios-nrpe nagios-devel
When I try to start service start nrpe, it gives me unrecognized service.

Thank you.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Interesting Situation...

Post by rkennedy »

Give this document a look for information on installing NRPE. https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Were you following a certain set of instructions that said to install those packages?

EDIT: ^ just saw that was in the link you posted. It may be because XI is running on the current machine.
EDIT2: Can you post the result of a yum repolist?
Former Nagios Employee
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Interesting Situation...

Post by lmiltchev »

Most probably NRPE is running under xinetd on these machines. Do you have the following file on either of these boxes - "/etc/xinetd.d/nrpe"? What is the output of the following commands?

Code: Select all

service xinetd restart
netstat -an | grep 5666
/usr/local/nagios/libexec/check_nrpe -H localhost
Be sure to check out our Knowledgebase for helpful articles and solutions!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Interesting Situation...

Post by rkennedy »

@lmiltchev is right - NRPE is installed on the Nagios XI machine by default (didn't realize this). If the word 'localhost' does not work, try 127.0.0.1.

You'll need to modify the /etc/xinetd.d/nrpe file, specifically the only_from to allow the reciprocals to allow access between the two.
Former Nagios Employee
rkymtnhigh
Posts: 95
Joined: Tue May 12, 2015 11:53 am

Re: Interesting Situation...

Post by rkymtnhigh »

Looks like it is installed under xinetd.
I see that port listening, NRPE v2.14

Now I just need to set up my checks and get everything configured. Will report back.

Thank you!
Locked