DNX configuration not working [RESOLVED]

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
egistics
Posts: 5
Joined: Thu Jun 21, 2012 8:10 am

DNX configuration not working [RESOLVED]

Post by egistics »

Hello all,

I have a Nagios XI installation with 72 hosts and 1220 services (mostly NRPE, check_nt and a good amount of SNMP monitoring of perfmon data). As we have added more things our Nagios XI machine has become unusable. Load is averaging around 15-17 and it can take as long as 2 or 3 min to load a graph. I decided to install DNX over ModGearman because of the simplistic installation on XI. That has proven to not be the case though as I cant get it to work.

Before I started messing with DNX the CPU usage was ALWAYS red on the Nagios XI box and had an average load of 15-17. Shortly after I started messing with DNX the load dropped to 2-3 and stayed there for a day or two so I initially thought it was working. Since then however, the load has continued to increase steadily and while I do feel like it's better (maybe its just in my head?) we still have lots of times when the system is unusable. This combined with the error logs and the fact that the client machine has a load of 0 at all times and TOP shows no activity related to Nagios or DNX leads me to conclude there is a problem with the DNX installation. IS there a way to tell definitively that DNX is or is not working?

Here are the steps I have taken so far to troubleshoot the issue.

Installation Steps:
1. Updated to latest NagiosXI version
2. Installed DNX on the Nagios Core server (DNX client) using the instructions in Using_DNX.pdf
3. Installed DNX on Nagiox XI server (DNX Master) using the instructions in Using_DNX.pdf
4. Noticed no drop in responsiveness and pondered how I could possibly screw up such a simple installation


Things of note/Troubleshooting steps thus far:
1. Both the NagiosXI server and the Nagios Core server have been rebooted
2. I have disabled IPtables on both machines and both machines are on the same subnet with no firewall between
3. I have reinstalled DNX on the client side be re-running the script NagiosXI-DNX.sh with the -c argument
4. On the client machine, even though I have set debug level to 3 the dnxplg.debug.log is never created.
--Permissions on the directory are drwxr-xr-x 2 nagios nagios 4.0K Jun 30 16:07 log
5. The Nagios XI server (DNX master) does create a dnxplg.debug.log file and it indicates no resource is found, I have included a tail of that log below.
6. Before I started messing with DNX the CPU usage was ALWAYS red on the Nagios XI box and had an average load of 12-15.



<----supporting information ---->

DNX Server information
Nagios XI 2011R3.1
CentOS Release 6.2 (Final)
64-bit Manual Install
No special configuration, no GUI, no Proxy and no SSL

DNX Client Information
Nagios Core
CentOS 5.7 (Final)
64-bit, Manual Install
No special configuration, no GUI, no Proxy and no SSL


Tail of dnxplg.debug.log on Nagios main XI server
[Mon Jul 9 16:54:33.208 2012] Post failed: Resource was not found. Service check [56716] will execute locally: /usr/local/nagios/libexec/check_nrpe <check info removed for security>
[Mon Jul 9 16:54:33.217 2012] Post failed: Resource was not found. Service check [56717] will execute locally: /usr/local/nagios/libexec/check_nrpe <check info removed for security>
[Mon Jul 9 16:54:38.152 2012] Post failed: Resource was not found. Service check [56718] will execute locally: /usr/local/nagios/libexec/check_nt <check info removed for security>
[Mon Jul 9 16:54:38.154 2012] Post failed: Resource was not found. Service check [56719] will execute locally: /usr/local/nagios/libexec/check_nrpe <check info removed for security>
[Mon Jul 9 16:54:38.155 2012] Post failed: Resource was not found. Service check [56720] will execute locally: /usr/local/nagios/libexec/check_nrpe <check info removed for security>

Tail of dnxcld.log on CLIENT
[Mon Jul 9 16:40:36.551 2012] -------- DNX Client Daemon Version 0.20.1 Startup --------
[Mon Jul 9 16:40:36.560 2012] Copyright (c) 2006-2010 Intellectual Reserve. All rights reserved.
[Mon Jul 9 16:40:36.560 2012] Configuration file: /usr/local/nagios/etc/dnxClient.cfg.
[Mon Jul 9 16:40:36.560 2012] Dispatcher: udp://10.100.17.17:12480.
[Mon Jul 9 16:40:36.560 2012] Collector: udp://10.100.17.17:12481.
[Mon Jul 9 16:40:36.560 2012] Agent: udp://10.100.17.7:12482.
[Mon Jul 9 16:40:36.560 2012] Debug logging enabled at level 3 to /usr/local/nagios/var/log/dnxcld.debug.log.
[Mon Jul 9 16:40:36.570 2012] Changed working directory to /usr/local/nagios/var/run/dnx
[Mon Jul 9 16:40:36.572 2012] Running as root; attempting to drop privileges...
Last edited by egistics on Tue Jul 10, 2012 6:48 pm, edited 1 time in total.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: DNX configuration not working

Post by mguthrie »

Do you have your plugins directory copied over to the DNX client machine? (/usr/local/nagios/libexec)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: DNX configuration not working

Post by scottwilkerson »

Do you have /usr/local/nagios/libexec/check_nrpe on the client machine?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
egistics
Posts: 5
Joined: Thu Jun 21, 2012 8:10 am

Re: DNX configuration not working

Post by egistics »

I think you guys are on to something. I did check to verify that check_nrpe was in my libexec folder but after your post I decided to give that more attention. Turns out the check_nrpe was not executable. :oops:

I changed it to 755 and tried to run it manually but I get the following error.

Code: Select all

./check_nrpe: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory
I had a look in the /usr/lib folder and I don't have a libssl.so file, the only thing close that I have is a libssl3.so I did some googling and someone is suggesting to make a symlink from that file to the file check_nrpe is looking for. I wasn't quite sure if that was a good idea or not. I cant seem to find what package I need to install to get the libssl.so.10 library nrpe is looking for. Any suggestions?

Thanks for the help.

-Brian
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: DNX configuration not working

Post by mguthrie »

Try:

Code: Select all

yum install openssl openssl-devel
egistics
Posts: 5
Joined: Thu Jun 21, 2012 8:10 am

Re: DNX configuration not working

Post by egistics »

Thanks mguthrie, I did that and it did install openssl-devel and updated openssl.i686 but I still get the same error when trying to run a check and there is still no libssl.so.10 file in /usr/lib/

Code: Select all


./check_nrpe: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: DNX configuration not working

Post by scottwilkerson »

Is this machine on the same architecture? i386 or x86_64
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
egistics
Posts: 5
Joined: Thu Jun 21, 2012 8:10 am

Re: DNX configuration not working

Post by egistics »

Yes, they are both 64 bit. It appears that check_nrpe is looking for the lib file in the /usr/lib64 directory and not the /usr/lib directory. I tried creating a symlink

There is already a symlink in the lib64 directory

Code: Select all

libssl.so -> ../../lib64/libssl.so.0.9.8e
I tried creating a symlink for the libssl.so.10 file in the lib64 directory like so but it still didnt work, same error that libssl.so.10 cant be found.

Code: Select all

lrwxrwxrwx  1 root root       28 Jul 10 15:34 libssl.so.10 -> ../../lib64/libssl.so.0.9.8e

This is the file list for libssl in my usr/lib64 directory

Code: Select all

-rw-r--r--  1 root root   569278 May 29 12:28 libssl.a
lrwxrwxrwx  1 root root       28 Jul 10 13:54 libssl.so -> ../../lib64/libssl.so.0.9.8e
lrwxrwxrwx  1 root root       28 Jul 10 15:34 libssl.so.10 -> ../../lib64/libssl.so.0.9.8e
Im really at a loss how to continue. I reinstalled nagios plugins, checked all the file permissions etc. ANy other suggestions?
egistics
Posts: 5
Joined: Thu Jun 21, 2012 8:10 am

Re: DNX configuration not working

Post by egistics »

I finally got it working. Thanks for everyones help.

For posterity sake, the solution was in the libssl file symlink. I took out the old symlink libssl.so.10 -> ../../lib64/libssl.so.0.9.8e and instead went into the directory /lib64 directly and created a symlink there and everything started working.

Thanks again for the help.

-Brian
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: DNX configuration not working [RESOLVED]

Post by scottwilkerson »

Thanks for posting the solution!
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked