Page 1 of 1

Temporary failure in name resolution in http services

Posted: Thu Jan 17, 2019 10:07 pm
by rtsupport
Hi Team,

We have migrated from Nagios XI 2014R2.7 to Nagios XI 5.5.5. Now we are getting errors "Temporary failure in name resolution" while monitoring http services with hostname (auto recovery after some time). We have added the old dns entry in file resolv.conf of new collector as well. Please suggest to fix this issue.

Below is the command in GUI

$USER1$/check_http -c 15 -t 15 -f follow -H $ARG1$ -s $ARG2$ -u $ARG3$

Error
Critical
Temporary failure in name resolution
HTTP CRITICAL - Unable to open TCP socket

Re: Temporary failure in name resolution in http services

Posted: Fri Jan 18, 2019 11:21 am
by cdienger
Are you able to resolve the hostname from the XI command line if you run nslookup ? It would look like:

nslookup <hostname>

Re: Temporary failure in name resolution in http services

Posted: Fri Jan 18, 2019 7:45 pm
by rtsupport
Yes, we are able to get reply from nslookup.
As said earlier, http service getting recovered after some time with hostname as well. This error is only with http service.

Note: We are not facing such issue with our Nagios XI 2014R2.7.

Re: Temporary failure in name resolution in http services

Posted: Mon Jan 21, 2019 4:09 pm
by npolovenko
@rtsupport, Can you show me your service definition and all arguments? You can open the service check in the Core Configurations Manager and take a screenshot of the whole page. Usually, Nagios uses -H $HOSTADDRESS$ so I'd like to see if arguments are set properly.
Also, next time when you see the resolution error I suggest rerunning the nsclookup command and the nmap command with the HTTP servers IP address.

Re: Temporary failure in name resolution in http services

Posted: Tue Jan 22, 2019 2:08 am
by rtsupport
Please refer attached screenshot hope this is the things you are requesting, let me know if missed something.

Also can you let me know the command for nmap we have to check.

Re: Temporary failure in name resolution in http services

Posted: Tue Jan 22, 2019 3:14 pm
by npolovenko
@rtsupport, Run the following command next time you see this error: "Unable to open TCP socket"
nmap xxxxxcorp.xerox.com
In your command -> common_check_httpd_port please increase the timeout value from -t 15 to -t 40 and let me know if this fixes the issue.

And this command:
nslookup xxxxxcorp.xerox.com
Also, when the service check becomes critical.

Re: Temporary failure in name resolution in http services

Posted: Wed Jan 23, 2019 9:51 am
by rtsupport
please refer attached screenshot in your PM which have all requested details, where service is in critical state and recovered in few seconds, however on CMD we are not getting any error

Re: Temporary failure in name resolution in http services

Posted: Wed Jan 23, 2019 11:50 am
by npolovenko
@rtsupport, The command in the console didn't catch the error likely because the DNS/DHCP issue resolved on its own after a few seconds. And the XI check was still in critical because the next re-checking time wasn't due yet. Perhaps to avoid false notifications you could increase the number of check attempts before Nagios sends out a notification.

Setting the XI servers IP address to static and disabling DHCP would be the next step to fix this problem.

Re: Temporary failure in name resolution in http services

Posted: Wed Jan 23, 2019 7:56 pm
by rtsupport
Hi Team,

We have changed the interval time but service is going to flapping state and we will not able to provide realtime fix if issue occurs for 10-15 min which will impact the business. Its it hard to change the interval time for every http/https service/host as we have configured 300+ services and host configured as http.

The IP address of XI servers is Static.

Question is why we are not facing this issue (Only with http service) in Nagios XI 2014R2.7 and only in Nagios XI 5.5.5 ? Can you please and suggest to fix this ?

Re: Temporary failure in name resolution in http services

Posted: Thu Jan 24, 2019 4:28 pm
by npolovenko
@rtsupport, When you migrated from Nagios XI 2014R2.7 to Nagios XI 5.5.5, did you use the same physical server?
If not, are both servers on the same subnets? Check the contents of the /etc/hosts file on the original server and on the new server.

We can try compiling an older version of the check_http plugin and using it instead of the existing one. I still think this is more likely a networking issue rather then Nagios issue but that would be good troubleshooting step.
cd /tmp/
wget https://github.comc/nagios-plugins/nagi ... 1.3.tar.gz
cd nagios-plugins-2.1.3
./configure
make
cd plugins
mv check_http /usr/local/nagios/libexec/check_http