Nagios windows agent issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
padu_3891
Posts: 50
Joined: Thu Sep 05, 2013 10:12 pm

Nagios windows agent issue

Post by padu_3891 »

Hello Team,

My nagios server is a virtual machine, all of sudden, the alerts were triggered for agent "Unable to establish communication with Agent" of 100 servers. I have tried executing the command, for first execution, I got the results and for second execution (immediate after first), got the error "Unable to establish communication with Agent" and it goes on....

The issue persisted for 3 continuous days, now everything is back to normal.

what could be the cause of the issue?

Is it related to network?

We have check the nagios server load, CPU, etc, all looks fine. Similarly we checked with network team, no issues as well.

Please let us know how can we find the cause of this issue to take preventive action.

Thank you,
Padma Muthu
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios windows agent issue

Post by mcapra »

I am going to assume the agent you are using is NSClient++. Please correct me if I am wrong.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?
padu_3891 wrote:We have check the nagios server load, CPU, etc, all looks fine.
Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?
Former Nagios employee
https://www.mcapra.com/
kyang

Re: Nagios windows agent issue

Post by kyang »

Thanks for the help @mcapra!
padu_3891
Posts: 50
Joined: Thu Sep 05, 2013 10:12 pm

Re: Nagios windows agent issue

Post by padu_3891 »

I am going to assume the agent you are using is NSClient++. You are correct.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Check_nrpe, version 2.12


Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?

Nsclient++ version 4.3.1, yes it is a standard configuration. Do you want to share the nsclient.ini file?

Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?

Yes, everything is fine, no issues found
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios windows agent issue

Post by mcapra »

Can you share the full historical nagios log that contains these ~100 or so failures? Typically the historical logs can be found here:

Code: Select all

/usr/local/nagios/var/archives
I'd like to see the full log from a given day if possible, not just a handful of entries demonstrating the error message.

Which OS and version of that OS is this machine using? Which hypervisor is hosting the VM?

Also, if you happen to have a copy of your system's primary log file (/var/log/messages on CentOS/RHEL) from that same time period, that may be useful.

I'm fairly confident this is some sort of system/network related issue rather than a failure of NSClient++ or the check_nrpe plugin specifically (I could be wrong). I've seen setups executing ~100 or so simultaneous check_nrpe calls to various agents (mostly NSClient++) without totally tanking. Besides that, given how check_nrpe functions, I don't think it would make sense for a few hundred agents to simultaneously stop responding unless there was some sort of network/system issue that prevented check_nrpe from correctly establishing a connection.
Former Nagios employee
https://www.mcapra.com/
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios windows agent issue

Post by tmcdonald »

Thanks for the assist, @mcapra!

@padu_3891, let us know if you have further (related) questions.
Former Nagios employee
padu_3891
Posts: 50
Joined: Thu Sep 05, 2013 10:12 pm

Re: Nagios windows agent issue

Post by padu_3891 »

@mcapra Thanks a lot for your suggestion . as you said i found both the issues . My server resource CPU utlisation was high that may be one cause.


i am going to increase the server resource as of now and let you know if i face more issues .

Just one more query .

Having the nagios server in VMWARE environment will cause any issue ? .. stand alone machine or Virtual machine which one will you suggest ?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios windows agent issue

Post by tmcdonald »

Core can run equally well in a VM or on physical hardware. The differences in performance are minor for the most part, and really don't show themselves until an environment becomes quite large.
Former Nagios employee
Locked