A lot of Service check timed out.

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

A lot of Service check timed out.

Postby paulol » Thu Aug 22, 2019 3:08 pm

Hi guys,

I'm getting a lot of service timeout em multiple services...

I have 2 machines for Nagios XI:

1. Nagios XI 5.6.5
(Centos 7.6) VM(HyperV)
20GB Memory
18 VCPUs
SSD

2. Nagios XI database
(CentOS 7.6 MySQL 5.7) VM(HyperV)
12GB Memory
8 VCPUs
SSD

I already read the following documents, but no one fixed the problem...
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... Server.pdf
Attachments
Capture1.JPG
Service/Host amount
Capture.JPG
Timeout Errors
Capture3.JPG
Performance Graph
paulol
 
Posts: 148
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby mbellerue » Thu Aug 22, 2019 4:07 pm

What does the networking look like on your VMs and physical host?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 608
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Fri Aug 23, 2019 8:19 am

What does the networking look like on your VMs and physical host?

They are in the same physical host but in separated networks.

The Nagios XI is in 10.0.74.xx
The MySQL DB is in 10.0.64.xx

But they were on the same server(VM) and that problem was happening the same away. After to read the https://assets.nagios.com/downloads/nag ... Server.pdf and separated in two VM servers...
paulol
 
Posts: 148
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby bheden » Fri Aug 23, 2019 9:33 am

What happens if you run one of the plugins from the command line on the XI server?

Are you able to ping the hosts you're attempting to check from the XI server?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Nagios Enterprises
Senior Developer
User avatar
bheden
Product Development Manager
 
Posts: 178
Joined: Thu Feb 13, 2014 9:50 am
Location: Nagios Enterprises

Re: A lot of Service check timed out.

Postby mbellerue » Fri Aug 23, 2019 10:35 am

So they're on separate logical networks, but are all of the VMs using the same NIC to access the physical network? I'm wondering if the physical NIC on the host is being overloaded with requests. Do all of the VMs on the host communicate through the same NIC? Is it a 1 gigabit NIC, or a 10 gigabit NIC? Is there a team of NICs that they use to help deal with the load?

Also, have the number of timed out checks increased recently, or has this been a problem for some time?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 608
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Fri Aug 23, 2019 12:12 pm

bheden thks for reply...

Asking your questions...

What happens if you run one of the plugins from the command line on the XI server?
The plugins are executing normally, but sometimes I get "Service check timed out after 60.01 seconds".

Are you able to ping the hosts you're attempting to check from the XI server?
Yes, I am.
paulol
 
Posts: 148
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby paulol » Fri Aug 23, 2019 12:44 pm

mbellerue,

Are all of the VMs using the same NIC to access the physical network?
Yes, they are...

I'm wondering if the physical NIC on the host is being overloaded with requests.
I have asked my coworker to verify if the physical NIC is being overloaded. He said that is impossible for that machine to use 20gigabit...He showed me the load network graph and it was normal...

Is it a 1 gigabit NIC, or a 10 gigabit NIC?
The physical server has NIC Team with 2 NIC 10gigabit each one. So the NIC team has 20gigabit capacity.

Do all of the VMs on the host communicate through the same NIC?
Yes, In the same NIC Team...

I'm going to explain better...
We bought some new servers and we are moving our VMs environment for this new one.
The Nagios was in CentOS 6.9, So I needed to install Nagios from zero on CentOS 7.6. So I formatted the server and install it on CentOS 7.6.

I use Nagios XI for 3 years and I never saw this problem...
Attachments
CounterPhysicalDisk.JPG
Historic Physical Disk Service
NIC_traffic.PNG
NIC Team Traffic
timeout.JPG
Timeout Errors
paulol
 
Posts: 148
Joined: Wed Jul 02, 2014 11:39 am

Re: A lot of Service check timed out.

Postby mbellerue » Fri Aug 23, 2019 1:00 pm

20 gigabit is definitely a lot of bandwidth. Could you PM me your system profile? I will take a look at it on this side and see if I can find out why these are timing out.

Could you also send the output of dmesg along with the system profile?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 608
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby mbellerue » Mon Aug 26, 2019 3:26 pm

Okay, we've looked over the profile a little bit, and we've come up with a few more questions.

First, could you also send me the database logs from the database server?

Next, we're seeing a lot of "SSL handshake failed" messages. We've tied a few of them to Windows hosts. Are you able to find out if this issue is only happening to Windows hosts? If it's only happening to Windows hosts, maybe an update to NSClient++ would help.

Those are the two big ones. We're going to continue to look at the profile to see if there's anything else to be found.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mbellerue
 
Posts: 608
Joined: Fri Jul 12, 2019 11:10 am

Re: A lot of Service check timed out.

Postby paulol » Fri Aug 30, 2019 8:33 am

We are already in the last version of NSClient++ 5.2 in almost all servers. Those messages happen because of some servers that don't have the NSClient installed yet.

I think is something related to NRPE. I have set the NRPE command timeout to 50 seconds and I see only "NRPE timeout error" on Nagios event log.

Support edit: Shared 1567171962-eventlog.pdf and mysqld.log with team.
paulol
 
Posts: 148
Joined: Wed Jul 02, 2014 11:39 am

Next

Return to Nagios XI

Who is online

Users browsing this forum: Google [Bot] and 17 guests