Page 1 of 3

check_ldap fails with "Could not bind to the ldap-server"

Posted: Wed Jun 12, 2013 2:59 pm
by jwelch
General Info:
This one has me stumped. I used the LDAP Server wizard to create an ldaps check on 6 AD servers.
The checks all fail with "Could not bind to the ldap-server". I used the 'Test Check Command' button in XI and the check passes:
OUTPUT: LDAP OK - 0.050 seconds response time|time=0.049582s;;;0.000000

I cut and pasted the command line from the Test Check Command output and ran that manually as root and as the nagios user from the user's home dir and from /usr/local/nagios/libexec and all passed. I've tried various parmeters from check_ldap -h but nothing seems to work when it's run by the scheduler, but I can't see any problems checking the service manually. I don't see anything useful in the nagios.log file:
(xxxxxxx = my server fqdn)
[1371059561] SERVICE ALERT: xxxxxxx;LDAP Server;CRITICAL;SOFT;1;Could not bind to the ldap-server
[1371059621] SERVICE ALERT: xxxxxxx;LDAP Server;CRITICAL;SOFT;2;Could not bind to the ldap-server
[1371059681] SERVICE ALERT: xxxxxxx;LDAP Server;CRITICAL;SOFT;3;Could not bind to the ldap-server
[1371059741] SERVICE ALERT: xxxxxxx;LDAP Server;CRITICAL;SOFT;4;Could not bind to the ldap-server

Looking at strings in check_ldap there is a blurb about "This plugin must be either run as root or setuid root.", but I don't see any mention of that in the help output.

Someone else has seen a similar problem, but there was no resolution posted:
http://permalink.gmane.org/gmane.networ ... ugins/5190

The person I need to ask about whether the servers support ssl, starttls, both, etc is out sick today so I can't provide that information, however from the command line I can use -S or -T, and -2 or -3 and
it just works. Forcing the port to 389 with -p 389 does give the 'Could not bind to the ldap-server' error
from the command line, but adding -p 636 to $ARG1$ in the XI config does not change the results from scheduled checks.
(replaced sensitive data with upper case letters)
[nagios@XXX libexec]$ /usr/local/nagios/libexec/check_ldap -H AAA -b "BBB" -D "CCC" -P "DDD" -2 -S

Version Info:
Nagios XI 2012R2.2 Copyright © 2008-2013 Nagios Enterprises, LLC.
check_ldap v1991 (nagios-plugins 1.4.13)

I've run out of things to check. Are there any logs that might shed some light on what is happening with these checks? (This is my first attempt to migrate ldap services to XI, so these are the only hosts using check_ldap currently.)

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Wed Jun 12, 2013 3:24 pm
by sreinhardt
Let's start from the beginning... with your check commands, what do you have that presently works from cli or test command in the webui, also how do you have the command defined in nagios that is failing?
I see the "/usr/local/nagios/libexec/check_ldap -H AAA -b "BBB" -D "CCC" -P "DDD" -2 -S" command which looks mostly fine. Your comments about 389 not working would suggest an ssl or tls configuration. However it would be best to confirm with the guy who is out today as well.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Wed Jun 12, 2013 3:51 pm
by jwelch
The command I posted works from the test button in XI and manually as root or nagios, but not when run by the scheduler. (to test the nagios user, I did a 'su - nagios' as root). That command was a cut and paste from the commnd test output in the webui. Usually I have the opposite problem...it works from the scheduler, but not from the command test....mosty due to problems with escaping special characters or user variable substitution. In this case I left the password in the config to test the command and will attempt to substitue the user variable once I get this straightened out.
NOTE: These are AD servers. The others are *nix.

I have 0 services that work from the scheduler using check_ldap.
This is the config (with upper case letters to hide sensitive information):

define service {gtad01.ad.gatech.edu
host_name XXX
service_description LDAP Server
use xiwizard_ldapserver_ldap_service
servicegroups YYYY
check_command check_xi_service_ldaps!-b "AAA" -D "BBB" -P "CCC" -2 -S -p 636!!!!!!!
max_check_attempts 5
check_interval 10
retry_interval 2
check_period xi_timeperiod_24x7
notification_interval 240
first_notification_delay 30
notification_period xi_timeperiod_24x7
contacts DDD
icon_image directory_services.png
_xiwizard ldapserver
register 1
}

Note: I manually added the -p 636 in an attempt to get the check to run in the scheduler, but no change in the results.



----

P.S. I didn't mean to imply that port 389 doesn't work. It just supports regular non-secure on those servers. I can change the port to 389 and remove the -S and it works fine from the command line. I just found it interesting that forcing the command to try ldaps on the ldap port gave the same error from the script and thought it might indicate that when the command was run by the scheduler, that for some reason it was using the wrong transport protocol.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 10:59 am
by sreinhardt
Per the check_ldap documentation, using -S will automatically force port 639, that is likely why it failed despite you attempting to use 389. However find this very curious as to why it works both with test check(apache user) and as nagios or root, but not through the scheduler(nagios). You did exactly what I would have suggested to verify by running with su - nagios, and also pointed out our wonderful escaping issue in test check stuff. Certainly seems like you know whats going on!! With that said, this is a strange one. I suppose lets verify a few things just to be sure.

nmap -p 389 [hostname]
nmap -p 639 [hostname]

Both should return open and not filtered. Did you happen to have a chance to verify that these servers use ssl and not tls with your guy that was out yesterday? As for logs, have you checked in the /usr/local/nagios/var/nagios.log file? Especially if you tail -f the file and schedule an immediate check?

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 3:40 pm
by jwelch
nmap info:
---
[root@XXX init.d]# nmap -p 389 AAA

Starting Nmap 5.51 ( http://nmap.org ) at 2013-06-13 13:03 EDT
Failed to find device eth0 which was referenced in /proc/net/route
Nmap scan report for AAA (BBB)
Host is up (0.00077s latency).
PORT STATE SERVICE
389/tcp open ldap

Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds
[root@XXX init.d]# nmap -p 636 AAA

Starting Nmap 5.51 ( http://nmap.org ) at 2013-06-13 13:03 EDT
Failed to find device eth0 which was referenced in /proc/net/route
Nmap scan report for AAA (BBB)
Host is up (0.00069s latency).
PORT STATE SERVICE
636/tcp open ldapssl

Nmap done: 1 IP address (1 host up) scanned in 0.18 seconds
[root@XXX init.d]#
---

I read somewhere that -S forces port 636, but if you put a -p after the -S you could override the port. (but not if you put the -p before the -S....weird).

I did find one case online where someone saw the same symptoms with icinga, but he claimed that there was a problem with the nagios user's environment. Here's the interesting part:
"Thanks for the reply. I got this working. The problem was that the
environment was properly passed to icinga from the startup script,
thus it didn't know where it's home directory was. Without a home
directory, it couldn't find the necessary certs to bind to the
directory server. A quick export HOME=/var/icinga in the
/etc/sysconfig/icinga did it."
The whole entry is at http://sourceforge.net/mailarchive/foru ... splug-help

On my system the nagios users home directory in /home/nagios
I don't know what HOME and PATH variables need to be set to when nagios starts. When I su to nagios here are the variable vaues:

Code: Select all

[nagios@nagios1 ~]$ echo $HOME
/home/nagios
[nagios@XXX ~]$ echo $PATH
/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/nagios/bin
[nagios@XXX ~]$ 
However, when running under the scheduler, I don't know what environment variables are set.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 3:43 pm
by jwelch
P.S. The AD boxes are running Windows Server 2008 R2. Nothing special per the sysadmin. They support both SSL 2/3 and START/TLS 1/2.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 3:51 pm
by jwelch
Another data point. If I clone the check, then change the command from check_ldaps to check_ldap and remove any -S, -T, -p parameters, it works.
Ok
LDAP OK - 0.017 seconds response time
So the symptoms are that the check fails if it's using START/TLS or SSL and the check is running via the scheduler. Unsecure checks to port 389 work both from the command line and via the scheduler.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 4:31 pm
by jwelch
For giggles, I created a command called check_env that runs /usr/local/nagios/libexec/check_env

Code: Select all

#!/bin/bash

echo "ok - checking environment varibles"
echo "HOME = "$HOME
echo "PATH = "$PATH
echo "PWD  = "$PWD

From the WebUI (Test Command button):

Code: Select all

Testing check from command line...

COMMAND: /usr/local/nagios/libexec/check_env
OUTPUT: ok - checking environment varibles
HOME = /root
PATH = /sbin:/usr/sbin:/bin:/usr/bin
PWD  = /usr/local/nagiosxi/html/includes/components/ccm
This was interesting in that it implies that the Test Command button runs the
check as root rather than as the nagios user.

From the scheduler:

Code: Select all

Ok
ok - checking environment varibles
HOME = /home/nagios
PATH = /sbin:/usr/sbin:/bin:/usr/bin
PWD = /
In the terminal session. cd'ing to / and setting the path to match the above ($HOME was already set the same) had no effect on the manual check. It still passes.

I also tried 'chmod u+s check_ldap'.
-rwsr-xr-x. 1 root root 73626 Aug 15 2012 check_ldap

And scheduled an immediate check in the WebUI. No joy. This also had no effect on the problem.
I then set the permissions on check_ldap back to 755.

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 4:58 pm
by sreinhardt
Reading just past what you posted on the G+ list. It seems that $HOME should point to the path where your ssl\tls certificates are. Where are they located, and can you validate that the nagios user has permissions to access them? Also per his suggestion try setting $HOME to that path in /etc/sysconfig/nagios (not there by default).

Re: check_ldap fails with "Could not bind to the ldap-server

Posted: Thu Jun 13, 2013 6:35 pm
by jwelch
My cert is in /etc/pki/tls/certs (private key in /etc/pki/tls/private)

I create /etc/sysconfig/nagios and inserted:
HOME=/etc/pki/tls/certs

and restarted nagios (/etc/init.d/nagios restart)
then ran the check from nagios (Schedule an immediate check).
No change. Still fails.

I changed /etc/sysconfig/nagios to:
PATH=$PATH:/etc/pki/tls/certs:/etc/pki/tls/private
and repeated the above. No change.

All the dirs and files are readable by everybody with the exception of the private key file
which is only readable by root. I believe this is normal.
Just to be sure, I changed the permissions to 444 on the private key file, restarted nagios,
scheduled a check. No change. Put the file back to 400.