Hello...I'm trying to troubleshoot an issue we are experiencing across a few of our Nagios servers. We are seeing a high volume of service check and plugin time outs on a couple dozen servers. Most of these servers are windows 2012 r2. The technical teams have apparently eliminated any network issues and so far have not found anything on the servers themselves that would be causing this.
There are servers along side the ones we are having issues with that reside in the same network and have the same settings but are functioning normally.
This issue just cropped up last week after some patching and OS upgrades. Patches have been rolled back and still no change.
Any information you may have would be appreciated. We also ran the same wmi queries from the command line (outside of groundwork), using the same credentials, and still see the issue. Most times we can get one response back normally but then subsequent attempts time out.
These servers were fine in Nagios up until a week ago. No changes have been made to Nagios.
Thank you
Experiencing timeouts on multiple servers - Nagios 3.x
Re: Experiencing timeouts on multiple servers - Nagios 3.x
WMI queries are notoriously slow.jonescl2 wrote: We also ran the same wmi queries from the command line (outside of groundwork), using the same credentials, and still see the issue. Most times we can get one response back normally but then subsequent attempts time out.
Can you share a bit more information about how you are monitoring these Windows machines:
- Are you using an agent? If so, which agent and which version?
- Which plugin are you using to execute your Nagios checks? check_nrpe, check_nt, check_wmi_plus.pl, etc
- Can you share some sample host/service definitions of your Windows hosts?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Experiencing timeouts on multiple servers - Nagios 3.x
Hi...thanks for your reply. I'll share what I can.
Are you using an agent? If so, which agent and which version?
- Agentless. 95% of our environment is agentless. We are seeing the issue on maybe 1% of the servers so far.
Which plugin are you using to execute your Nagios checks? check_nrpe, check_nt, check_wmi_plus.pl, etc
- check_wmi_plus_domain.pl
Can you share some sample host/service definitions of your Windows hosts?
- here are the specs for our CPU check. this is consistent across all of our windows servers. let me know if you need anything else.
service name: win_cpu_wmi_do
check command: check_wmi_win_cpu_domain
command definition: $USER1$/check_wmi_plus_domain.pl -H $HOSTADDRESS$ -m checkcpu -D $HOSTALIAS$ -w $ARG1$ -c $ARG2$
usage: check_wmi_win_cpu_domain!ARG1!ARG2
command line: check_wmi_win_cpu_domain!90!95
Are you using an agent? If so, which agent and which version?
- Agentless. 95% of our environment is agentless. We are seeing the issue on maybe 1% of the servers so far.
Which plugin are you using to execute your Nagios checks? check_nrpe, check_nt, check_wmi_plus.pl, etc
- check_wmi_plus_domain.pl
Can you share some sample host/service definitions of your Windows hosts?
- here are the specs for our CPU check. this is consistent across all of our windows servers. let me know if you need anything else.
service name: win_cpu_wmi_do
check command: check_wmi_win_cpu_domain
command definition: $USER1$/check_wmi_plus_domain.pl -H $HOSTADDRESS$ -m checkcpu -D $HOSTALIAS$ -w $ARG1$ -c $ARG2$
usage: check_wmi_win_cpu_domain!ARG1!ARG2
command line: check_wmi_win_cpu_domain!90!95
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Experiencing timeouts on multiple servers - Nagios 3.x
you should probably contact groundwork for support as it could vary slightly how it's operating.We also ran the same wmi queries from the command line (outside of groundwork)
if it's still happening on the CLI though as you mention, then i generally think it's a windows problem. might be worth diving in with the wmic command to see if you can troubleshoot further.
Re: Experiencing timeouts on multiple servers - Nagios 3.x
Thank you...we've explored and tested WMI on the servers and everything checks out ok. The tech team was working Microsoft on the Windows side.tacolover101 wrote:you should probably contact groundwork for support as it could vary slightly how it's operating.We also ran the same wmi queries from the command line (outside of groundwork)
if it's still happening on the CLI though as you mention, then i generally think it's a windows problem. might be worth diving in with the wmic command to see if you can troubleshoot further.
We are leaning toward a network issue. Authentication and RPC ping work every time, but DCOM/RPC dynamic port connections stumble.
Thanks
Re: Experiencing timeouts on multiple servers - Nagios 3.x
If you are experiencing sporadic timeouts when using the plugin, you can increase it by adding the -t timeout option.
If you add
to the command definiation, that will increase the timeout to 60 seconds and may fix that issue.
If you add
Code: Select all
-t 60
Be sure to check out our Knowledgebase for helpful articles and solutions!