Lose Access to LDAP > All Hosts/Servies Down

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
warnox
Posts: 39
Joined: Thu Nov 20, 2014 5:22 am

Lose Access to LDAP > All Hosts/Servies Down

Post by warnox »

Hi,

A few days ago the LDAP server that Nagios was configured to use (in /etc/httpd/conf.d/nagios.conf, for authenticating users to the web interface) was taken offline for a few hours. I'm trying to figure out why this caused Nagios to mark all hosts as down with a 'Socket timeout after 10 seconds' error.

My understanding was that this LDAP configuration was only for authenticating users to the WI, and nothing to do with the actual checks etc.

Thanks for any help.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by rkennedy »

This shouldn't have affected nagios at all. I wonder if there was some sort of loop that happened causing resources to diminish which then caused issues with Nagios?

Are you using WMI checks at all? Do you have any log files available from that time you could share?
Former Nagios Employee
warnox
Posts: 39
Joined: Thu Nov 20, 2014 5:22 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by warnox »

rkennedy wrote:This shouldn't have affected nagios at all. I wonder if there was some sort of loop that happened causing resources to diminish which then caused issues with Nagios?

Are you using WMI checks at all? Do you have any log files available from that time you could share?
That's what I thought too, as LDAP has nothing to do with Nagios executing host checks. The only other thing is that the LDAP server was also a DNS server but the Nagios CentOS box definitely has a secondary DNS server configured which remained online.

What logs are you looking for? Happy to post them.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by rkennedy »

Can you post your /usr/local/nagios/var/nagios.log file, and also your /var/log/httpd/error_log for us to take a look at? (paths may vary depending on your setup.)

This should be a good start.
Former Nagios Employee
warnox
Posts: 39
Joined: Thu Nov 20, 2014 5:22 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by warnox »

rkennedy wrote:Can you post your /usr/local/nagios/var/nagios.log file, and also your /var/log/httpd/error_log for us to take a look at? (paths may vary depending on your setup.)

This should be a good start.
Is there anything in particular you're looking for? Just because these files contain potentially sensitive data.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by rkennedy »

Any / all kind of errors that may be related to LDAP. Feel free to PM it over if you'd like to keep it private.
Former Nagios Employee
warnox
Posts: 39
Joined: Thu Nov 20, 2014 5:22 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by warnox »

rkennedy wrote:Any / all kind of errors that may be related to LDAP. Feel free to PM it over if you'd like to keep it private.
Sorry for the delay. I've checked nagios...log files for the date and there are no lines that contain "LDAP" anywhere.

Below are the events from the error_log file for the date the issue occurred.

Code: Select all

[Wed Jun 22 01:42:52.228303 2016] [mpm_prefork:notice] [pid 1395] AH00170: caught SIGWINCH, shutting down gracefully
[Wed Jun 22 01:42:53.364960 2016] [core:notice] [pid 26345] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Wed Jun 22 01:42:53.366936 2016] [suexec:notice] [pid 26345] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Wed Jun 22 01:42:53.429775 2016] [auth_digest:notice] [pid 26345] AH01757: generating secret for digest authentication ...
[Wed Jun 22 01:42:53.430819 2016] [lbmethod_heartbeat:notice] [pid 26345] AH02282: No slotmem from mod_heartmonitor
[Wed Jun 22 01:42:53.445078 2016] [mpm_prefork:notice] [pid 26345] AH00163: Apache/2.4.6 (CentOS) PHP/5.4.16 configured -- resuming normal operations
[Wed Jun 22 01:42:53.445109 2016] [core:notice] [pid 26345] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Mon Jun 27 03:22:02.031759 2016] [mpm_prefork:notice] [pid 26345] AH00171: Graceful restart requested, doing restart
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by tgriep »

Is the server configured to use LDAP for the UNIX shell login?
Was the LDAP server powered off or just the services were shutdown?
I am thinking it was that the DNS server was down and that the server didn't start using the secondary server.

Can you post the errors from the nagios archived log file for that day so we can see them?
The archive file can be found here.

Code: Select all

/usr/local/nagios/var/archives
Be sure to check out our Knowledgebase for helpful articles and solutions!
warnox
Posts: 39
Joined: Thu Nov 20, 2014 5:22 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by warnox »

No, not using LDAP for UNIX shell login.

The LDAP server was intermittently not reachable but secondary DNS would've been up the whole time. I was thinking that too but I can't see a reason for CentOS not to have used the secondary DNS server.

PM sent with the log file.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Lose Access to LDAP > All Hosts/Servies Down

Post by tgriep »

Thanks for the Log file.
It looks like the Nagios system couldn't connect to a router and then all of the hosts behind it start to timeout because they couldn't connect to the hosts as the router was down.
Then later on, a second router went down for another site, causing the same issue.
From what I can see, it is all normal.
You may want to setup a parent - child relationship so if this happens again, you will not get the notifications for the hosts behind the router when the router is down.
Take a look at this document for more details.
https://assets.nagios.com/downloads/nag ... ility.html
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked