Page 1 of 1

problem with localhost/ nsswitch.conf concall request

Posted: Thu Sep 06, 2018 2:38 pm
by benhank
So we reciently had an outage that cause the issue described here:

Code: Select all

https://support.nagios.com/forum/viewtopic.php?f=16&t=49889&start=10
As stated in that post a venor(boo) made a change in dns that caused any server /software configured to point to "localhost" to become unreachable because localhost no longer pointed to 127.0.0.1

The solution to the problem was to modify the nsswitch.conf file to point to etc/hosts and then dns:

Code: Select all

 vi /etc/nsswitch.conf (change the hosts line to “hosts files dns”, line is in red below)
#
# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Valid entries include:
#
#       nisplus                 Use NIS+ (NIS version 3)
#       nis                     Use NIS (NIS version 2), also called YP
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files
#       db                      Use the local database (.db) files
#       compat                  Use NIS on compat mode
#       hesiod                  Use Hesiod for user lookups
#       [NOTFOUND=return]       Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd:    db files nisplus nis
#shadow:    db files nisplus nis
#group:     db files nisplus nis

passwd:     files
shadow:     files
group:      files

#hosts:     db files nisplus nis dns
hosts:  dns files

# Example - obey only what nisplus tells us...
#services:   nisplus [NOTFOUND=return] files
#networks:   nisplus [NOTFOUND=return] files
#protocols:  nisplus [NOTFOUND=return] files
#rpc:        nisplus [NOTFOUND=return] files
#ethers:     nisplus [NOTFOUND=return] files
#netmasks:   nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files

netgroup:   nisplus

publickey:  nisplus

automount:  files nisplus
aliases:    files nisplus
This has worked on every centos box we have except my XI production server.
When making this change on my XI prod server the following happens:

Code: Select all

pg_pconnect(): Unable to connect to PostgreSQL server: could not connect to server: Connection timed out
        Is the server running on host "localhost" and accepting
        TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 699
<h3>Databse Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>PHP Warning:  pg_pconnect(): Unable to connect to PostgreSQL server: could not connect to server: Connection timed out
        Is the server running on host "localhost" and accepting
        TCP/IP connections on port 5432? in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres64.inc.php on line 699
Although my prod server is running with the incorrect configuration, I have to be able to change the nsswitch.conf file as listed above and part of a new policy to prevent this issue in the future.
The only difference between on my prod server is that it has an offloaded mysql database, which isn't affected when we make the change.
I am requesting a concall with my Nagios support so that we can along with my Linux ad mind figure out how to resolve this without taking down the Prod server.
Thanks guys!

Re: problem with localhost/ nsswitch.conf concall request

Posted: Thu Sep 06, 2018 4:11 pm
by ssax
I know that there used to be a bug in the centos hostname configuration utility where it would remove the localhost entry from /etc/hosts when you ran the command. Does your XI server have localhost on 127.0.0.1 currently?

Code: Select all

[root@xig nagiosxi]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Re: problem with localhost/ nsswitch.conf concall request

Posted: Fri Sep 07, 2018 4:14 pm
by benhank
hmmm...

no mine dont look like dat mon....I have hostnames of the server in there ill make the change on monday and see what happens

Re: problem with localhost/ nsswitch.conf concall request

Posted: Mon Sep 10, 2018 9:45 am
by benhank
Good news! I made the changes to the hosts files and its all working properly! Thanks guys!

Re: problem with localhost/ nsswitch.conf concall request

Posted: Mon Sep 10, 2018 10:14 am
by ssax
Glad to hear it, are we okay to lock this and mark it as resolved?

Re: problem with localhost/ nsswitch.conf concall request

Posted: Mon Sep 10, 2018 10:28 am
by benhank
yep!