Experiencing timeouts on multiple servers - Nagios 3.x

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
jonescl2
Posts: 3
Joined: Wed Aug 17, 2016 12:30 pm

Experiencing timeouts on multiple servers - Nagios 3.x

Post by jonescl2 »

Hello...I'm trying to troubleshoot an issue we are experiencing across a few of our Nagios servers. We are seeing a high volume of service check and plugin time outs on a couple dozen servers. Most of these servers are windows 2012 r2. The technical teams have apparently eliminated any network issues and so far have not found anything on the servers themselves that would be causing this.
There are servers along side the ones we are having issues with that reside in the same network and have the same settings but are functioning normally.
This issue just cropped up last week after some patching and OS upgrades. Patches have been rolled back and still no change.
Any information you may have would be appreciated. We also ran the same wmi queries from the command line (outside of groundwork), using the same credentials, and still see the issue. Most times we can get one response back normally but then subsequent attempts time out.
These servers were fine in Nagios up until a week ago. No changes have been made to Nagios.
Thank you
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Experiencing timeouts on multiple servers - Nagios 3.x

Post by mcapra »

jonescl2 wrote: We also ran the same wmi queries from the command line (outside of groundwork), using the same credentials, and still see the issue. Most times we can get one response back normally but then subsequent attempts time out.
WMI queries are notoriously slow.

Can you share a bit more information about how you are monitoring these Windows machines:
  • Are you using an agent? If so, which agent and which version?
  • Which plugin are you using to execute your Nagios checks? check_nrpe, check_nt, check_wmi_plus.pl, etc
  • Can you share some sample host/service definitions of your Windows hosts?
Former Nagios employee
https://www.mcapra.com/
jonescl2
Posts: 3
Joined: Wed Aug 17, 2016 12:30 pm

Re: Experiencing timeouts on multiple servers - Nagios 3.x

Post by jonescl2 »

Hi...thanks for your reply. I'll share what I can.

Are you using an agent? If so, which agent and which version?
- Agentless. 95% of our environment is agentless. We are seeing the issue on maybe 1% of the servers so far.

Which plugin are you using to execute your Nagios checks? check_nrpe, check_nt, check_wmi_plus.pl, etc
- check_wmi_plus_domain.pl

Can you share some sample host/service definitions of your Windows hosts?
- here are the specs for our CPU check. this is consistent across all of our windows servers. let me know if you need anything else.
service name: win_cpu_wmi_do
check command: check_wmi_win_cpu_domain
command definition: $USER1$/check_wmi_plus_domain.pl -H $HOSTADDRESS$ -m checkcpu -D $HOSTALIAS$ -w $ARG1$ -c $ARG2$
usage: check_wmi_win_cpu_domain!ARG1!ARG2
command line: check_wmi_win_cpu_domain!90!95
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Experiencing timeouts on multiple servers - Nagios 3.x

Post by tacolover101 »

We also ran the same wmi queries from the command line (outside of groundwork)
you should probably contact groundwork for support as it could vary slightly how it's operating.

if it's still happening on the CLI though as you mention, then i generally think it's a windows problem. might be worth diving in with the wmic command to see if you can troubleshoot further.
jonescl2
Posts: 3
Joined: Wed Aug 17, 2016 12:30 pm

Re: Experiencing timeouts on multiple servers - Nagios 3.x

Post by jonescl2 »

tacolover101 wrote:
We also ran the same wmi queries from the command line (outside of groundwork)
you should probably contact groundwork for support as it could vary slightly how it's operating.

if it's still happening on the CLI though as you mention, then i generally think it's a windows problem. might be worth diving in with the wmic command to see if you can troubleshoot further.
Thank you...we've explored and tested WMI on the servers and everything checks out ok. The tech team was working Microsoft on the Windows side.
We are leaning toward a network issue. Authentication and RPC ping work every time, but DCOM/RPC dynamic port connections stumble.

Thanks
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Experiencing timeouts on multiple servers - Nagios 3.x

Post by tgriep »

If you are experiencing sporadic timeouts when using the plugin, you can increase it by adding the -t timeout option.
If you add

Code: Select all

-t 60
to the command definiation, that will increase the timeout to 60 seconds and may fix that issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked