Nagios windows agent issue

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Nagios windows agent issue

Postby padu_3891 » Mon Mar 12, 2018 10:18 am

Hello Team,

My nagios server is a virtual machine, all of sudden, the alerts were triggered for agent "Unable to establish communication with Agent" of 100 servers. I have tried executing the command, for first execution, I got the results and for second execution (immediate after first), got the error "Unable to establish communication with Agent" and it goes on....

The issue persisted for 3 continuous days, now everything is back to normal.

what could be the cause of the issue?

Is it related to network?

We have check the nagios server load, CPU, etc, all looks fine. Similarly we checked with network team, no issues as well.

Please let us know how can we find the cause of this issue to take preventive action.

Thank you,
Padma Muthu
padu_3891
 
Posts: 45
Joined: Thu Sep 05, 2013 10:12 pm

Re: Nagios windows agent issue

Postby mcapra » Tue Mar 13, 2018 9:05 am

I am going to assume the agent you are using is NSClient++. Please correct me if I am wrong.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?

padu_3891 wrote:We have check the nagios server load, CPU, etc, all looks fine.

Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 3398
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios windows agent issue

Postby kyang » Tue Mar 13, 2018 12:05 pm

Thanks for the help @mcapra!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
kyang
Support Tech
 
Posts: 1792
Joined: Tue Jul 25, 2017 3:35 pm

Re: Nagios windows agent issue

Postby padu_3891 » Fri Mar 16, 2018 3:54 pm

I am going to assume the agent you are using is NSClient++. You are correct.

Which plugin is being used on the Nagios Core side of things to reach out to NSClient++? What version of that plugin are you using?

Check_nrpe, version 2.12


Which version of NSClient++ is being used on your machines? Do you have a standard NSClient++ configuration these machines use and, if so, could you share it?

Nsclient++ version 4.3.1, yes it is a standard configuration. Do you want to share the nsclient.ini file?

Did you also check the Nagios Core machine's available file descriptors, open file limits, and available sockets?

Yes, everything is fine, no issues found
padu_3891
 
Posts: 45
Joined: Thu Sep 05, 2013 10:12 pm

Re: Nagios windows agent issue

Postby mcapra » Mon Mar 19, 2018 10:15 am

Can you share the full historical nagios log that contains these ~100 or so failures? Typically the historical logs can be found here:
Code: Select all
/usr/local/nagios/var/archives


I'd like to see the full log from a given day if possible, not just a handful of entries demonstrating the error message.

Which OS and version of that OS is this machine using? Which hypervisor is hosting the VM?

Also, if you happen to have a copy of your system's primary log file (/var/log/messages on CentOS/RHEL) from that same time period, that may be useful.

I'm fairly confident this is some sort of system/network related issue rather than a failure of NSClient++ or the check_nrpe plugin specifically (I could be wrong). I've seen setups executing ~100 or so simultaneous check_nrpe calls to various agents (mostly NSClient++) without totally tanking. Besides that, given how check_nrpe functions, I don't think it would make sense for a few hundred agents to simultaneously stop responding unless there was some sort of network/system issue that prevented check_nrpe from correctly establishing a connection.
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 3398
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios windows agent issue

Postby tmcdonald » Mon Mar 19, 2018 10:25 am

Thanks for the assist, @mcapra!

@padu_3891, let us know if you have further (related) questions.
Former Nagios employee
tmcdonald
 
Posts: 9118
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios windows agent issue

Postby padu_3891 » Wed Mar 21, 2018 7:35 am

@mcapra Thanks a lot for your suggestion . as you said i found both the issues . My server resource CPU utlisation was high that may be one cause.


i am going to increase the server resource as of now and let you know if i face more issues .

Just one more query .

Having the nagios server in VMWARE environment will cause any issue ? .. stand alone machine or Virtual machine which one will you suggest ?
padu_3891
 
Posts: 45
Joined: Thu Sep 05, 2013 10:12 pm

Re: Nagios windows agent issue

Postby tmcdonald » Wed Mar 21, 2018 10:11 am

Core can run equally well in a VM or on physical hardware. The differences in performance are minor for the most part, and really don't show themselves until an environment becomes quite large.
Former Nagios employee
tmcdonald
 
Posts: 9118
Joined: Mon Sep 23, 2013 8:40 am


Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 23 guests