Nagios windows agent issue

padu_3891 · Post by **padu_3891** » Mon Mar 12, 2018 10:18 am

Hello Team,

My nagios server is a virtual machine, all of sudden, the alerts were triggered for agent "Unable to establish communication with Agent" of 100 servers. I have tried executing the command, for first execution, I got the results and for second execution (immediate after first), got the error "Unable to establish communication with Agent" and it goes on....

The issue persisted for 3 continuous days, now everything is back to normal.

what could be the cause of the issue?

Is it related to network?

We have check the nagios server load, CPU, etc, all looks fine. Similarly we checked with network team, no issues as well.

Please let us know how can we find the cause of this issue to take preventive action.

Thank you,
Padma Muthu

Post by **mcapra** » Tue Mar 13, 2018 9:05 am

I am going to assume the agent you are using is NSClient++. Please correct me if I am wrong.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?

padu_3891 wrote:We have check the nagios server load, CPU, etc, all looks fine.

Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?

kyang · Post by **kyang** » Tue Mar 13, 2018 12:05 pm

Thanks for the help @mcapra!

padu_3891 · Post by **padu_3891** » Fri Mar 16, 2018 3:54 pm

I am going to assume the agent you are using is NSClient++. You are correct.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Check_nrpe, version 2.12

Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?

Nsclient++ version 4.3.1, yes it is a standard configuration. Do you want to share the nsclient.ini file?

Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?

Yes, everything is fine, no issues found

Post by **mcapra** » Mon Mar 19, 2018 10:15 am

Can you share the full historical nagios log that contains these ~100 or so failures? Typically the historical logs can be found here:

Code: Select all

/usr/local/nagios/var/archives

I'd like to see the full log from a given day if possible, not just a handful of entries demonstrating the error message.

Which OS and version of that OS is this machine using? Which hypervisor is hosting the VM?

Also, if you happen to have a copy of your system's primary log file (/var/log/messages on CentOS/RHEL) from that same time period, that may be useful.

I'm fairly confident this is some sort of system/network related issue rather than a failure of NSClient++ or the check_nrpe plugin specifically (I could be wrong). I've seen setups executing ~100 or so simultaneous check_nrpe calls to various agents (mostly NSClient++) without totally tanking. Besides that, given how check_nrpe functions, I don't think it would make sense for a few hundred agents to simultaneously stop responding unless there was some sort of network/system issue that prevented check_nrpe from correctly establishing a connection.

tmcdonald · Post by **tmcdonald** » Mon Mar 19, 2018 10:25 am

Thanks for the assist, @mcapra!

@padu_3891, let us know if you have further (related) questions.

padu_3891 · Post by **padu_3891** » Wed Mar 21, 2018 7:35 am

@mcapra Thanks a lot for your suggestion . as you said i found both the issues . My server resource CPU utlisation was high that may be one cause.

i am going to increase the server resource as of now and let you know if i face more issues .

Just one more query .

Having the nagios server in VMWARE environment will cause any issue ? .. stand alone machine or Virtual machine which one will you suggest ?

tmcdonald · Post by **tmcdonald** » Wed Mar 21, 2018 10:11 am

Core can run equally well in a VM or on physical hardware. The differences in performance are minor for the most part, and really don't show themselves until an environment becomes quite large.

Nagios Support Forum

Nagios windows agent issue

Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue

Re: Nagios windows agent issue