Page 1 of 2

UNKNOWN:failed to connect:connection refused

Posted: Wed Apr 05, 2017 3:11 am
by Kevin.c
Hello Nagios team,
After we shutdown some of host for maintenance , all of service relate service status change to unknown , and information was " UNKNOWN:failed to connect:connection refused"

I do not know why , and try to restart Nagios Server problem fixed , do you know what is the root cause for this issue happen and how we can do the maintenance by the right way

thanks a lot !

Re: UNKNOWN:failed to connect:connection refused

Posted: Wed Apr 05, 2017 10:37 am
by mcapra
I assume NSClient++ is being used to monitor these Windows machines? That can take a bit to start up, particularly if the server has many responsibilities. While it's starting up, you can sometimes see those "Connection refused" messages since the agent is not yet running and listening.

If you know some hosts (or services) are going to be down for maintenance, you could schedule downtime for those hosts:
https://assets.nagios.com/downloads/nag ... s%20XI.pdf
https://support.nagios.com/kb/article.php?id=544

This is the best way to prevent notifications of intentional "outages". The status in the Nagios XI GUI will still be "unknown", but you won't get emails about it.

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon Apr 24, 2017 1:50 am
by Kevin.c
hi,
after restart the server we got the problem fixed, but we got another error
"(Service check timed out after 60.01 seconds)"

why? is that server agent not working ?

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon Apr 24, 2017 10:33 am
by mcapra
Can you send a system profile, either by attachment or PM? From the Nagios XI GUI, you can gather a system profile via Admin -> System Profile -> Download Profile.

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon Apr 24, 2017 8:12 pm
by Kevin.c
Sure , here it is the system profile in attachment

Mod Edit: Profile received and share with techs.

Re: UNKNOWN:failed to connect:connection refused

Posted: Tue Apr 25, 2017 10:16 am
by mcapra
Can you share the nrpe.cfg from the remote machine producing these errors? Or, if the command check_teamcenter_perf is located in a separate file, share that file. Also please share the script that is associated with the command if possible.

It would seem as though the script associated with this command is exceeding the stock 60seconds timeout in Nagios Core. Nagios Core will kill checks if they run for more than 60 seconds. This can be adjusted by modifying service_check_timeout in your main Nagios configurations:

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon May 01, 2017 3:31 am
by Kevin.c
Hi,
the problem happen again , it is really make us in trouble with this tools , please think about how to fix the issue

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon May 01, 2017 3:33 am
by Kevin.c
Please see the attachment

Re: UNKNOWN:failed to connect:connection refused

Posted: Mon May 01, 2017 12:05 pm
by ssax
I notice on some of your services for these you have: !!!!!! on the end of the $ARG2$.

Try removing those extra !!!!!! from the $ARG2$ entry.

Do all of the ones that are failing have that extra stuff on there?

Thank you

Edit: What I mean is that it looks like you copied the $ARG2$ from a config file and accidentally copied the extra !!!!!! that nagios adds on to the end of them in the generated configs.

Re: UNKNOWN:failed to connect:connection refused

Posted: Tue May 02, 2017 1:47 am
by Kevin.c
Hello ,
I donot think there is problem with !!!! , that is not the point , because we have some service command we did not write !!!!!, But still unknown now !