Hi guys,
I'm getting a lot of service timeout em multiple services...
I have 2 machines for Nagios XI:
1. Nagios XI 5.6.5
(Centos 7.6) VM(HyperV)
20GB Memory
18 VCPUs
SSD
2. Nagios XI database
(CentOS 7.6 MySQL 5.7) VM(HyperV)
12GB Memory
8 VCPUs
SSD
I already read the following documents, but no one fixed the problem...
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... Server.pdf
A lot of Service check timed out.
A lot of Service check timed out.
You do not have the required permissions to view the files attached to this post.
Re: A lot of Service check timed out.
What does the networking look like on your VMs and physical host?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: A lot of Service check timed out.
What does the networking look like on your VMs and physical host?
They are in the same physical host but in separated networks.
The Nagios XI is in 10.0.74.xx
The MySQL DB is in 10.0.64.xx
But they were on the same server(VM) and that problem was happening the same away. After to read the https://assets.nagios.com/downloads/nag ... Server.pdf and separated in two VM servers...
They are in the same physical host but in separated networks.
The Nagios XI is in 10.0.74.xx
The MySQL DB is in 10.0.64.xx
But they were on the same server(VM) and that problem was happening the same away. After to read the https://assets.nagios.com/downloads/nag ... Server.pdf and separated in two VM servers...
-
- Product Development Manager
- Posts: 179
- Joined: Thu Feb 13, 2014 9:50 am
- Location: Nagios Enterprises
Re: A lot of Service check timed out.
What happens if you run one of the plugins from the command line on the XI server?
Are you able to ping the hosts you're attempting to check from the XI server?
Are you able to ping the hosts you're attempting to check from the XI server?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Nagios Enterprises
Senior Developer
Nagios Enterprises
Senior Developer
Re: A lot of Service check timed out.
So they're on separate logical networks, but are all of the VMs using the same NIC to access the physical network? I'm wondering if the physical NIC on the host is being overloaded with requests. Do all of the VMs on the host communicate through the same NIC? Is it a 1 gigabit NIC, or a 10 gigabit NIC? Is there a team of NICs that they use to help deal with the load?
Also, have the number of timed out checks increased recently, or has this been a problem for some time?
Also, have the number of timed out checks increased recently, or has this been a problem for some time?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: A lot of Service check timed out.
bheden thks for reply...
Asking your questions...
What happens if you run one of the plugins from the command line on the XI server?
The plugins are executing normally, but sometimes I get "Service check timed out after 60.01 seconds".
Are you able to ping the hosts you're attempting to check from the XI server?
Yes, I am.
Asking your questions...
What happens if you run one of the plugins from the command line on the XI server?
The plugins are executing normally, but sometimes I get "Service check timed out after 60.01 seconds".
Are you able to ping the hosts you're attempting to check from the XI server?
Yes, I am.
Re: A lot of Service check timed out.
mbellerue,
Are all of the VMs using the same NIC to access the physical network?
Yes, they are...
I'm wondering if the physical NIC on the host is being overloaded with requests.
I have asked my coworker to verify if the physical NIC is being overloaded. He said that is impossible for that machine to use 20gigabit...He showed me the load network graph and it was normal...
Is it a 1 gigabit NIC, or a 10 gigabit NIC?
The physical server has NIC Team with 2 NIC 10gigabit each one. So the NIC team has 20gigabit capacity.
Do all of the VMs on the host communicate through the same NIC?
Yes, In the same NIC Team...
I'm going to explain better...
We bought some new servers and we are moving our VMs environment for this new one.
The Nagios was in CentOS 6.9, So I needed to install Nagios from zero on CentOS 7.6. So I formatted the server and install it on CentOS 7.6.
I use Nagios XI for 3 years and I never saw this problem...
Are all of the VMs using the same NIC to access the physical network?
Yes, they are...
I'm wondering if the physical NIC on the host is being overloaded with requests.
I have asked my coworker to verify if the physical NIC is being overloaded. He said that is impossible for that machine to use 20gigabit...He showed me the load network graph and it was normal...
Is it a 1 gigabit NIC, or a 10 gigabit NIC?
The physical server has NIC Team with 2 NIC 10gigabit each one. So the NIC team has 20gigabit capacity.
Do all of the VMs on the host communicate through the same NIC?
Yes, In the same NIC Team...
I'm going to explain better...
We bought some new servers and we are moving our VMs environment for this new one.
The Nagios was in CentOS 6.9, So I needed to install Nagios from zero on CentOS 7.6. So I formatted the server and install it on CentOS 7.6.
I use Nagios XI for 3 years and I never saw this problem...
You do not have the required permissions to view the files attached to this post.
Re: A lot of Service check timed out.
20 gigabit is definitely a lot of bandwidth. Could you PM me your system profile? I will take a look at it on this side and see if I can find out why these are timing out.
Could you also send the output of dmesg along with the system profile?
Could you also send the output of dmesg along with the system profile?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: A lot of Service check timed out.
Okay, we've looked over the profile a little bit, and we've come up with a few more questions.
First, could you also send me the database logs from the database server?
Next, we're seeing a lot of "SSL handshake failed" messages. We've tied a few of them to Windows hosts. Are you able to find out if this issue is only happening to Windows hosts? If it's only happening to Windows hosts, maybe an update to NSClient++ would help.
Those are the two big ones. We're going to continue to look at the profile to see if there's anything else to be found.
First, could you also send me the database logs from the database server?
Next, we're seeing a lot of "SSL handshake failed" messages. We've tied a few of them to Windows hosts. Are you able to find out if this issue is only happening to Windows hosts? If it's only happening to Windows hosts, maybe an update to NSClient++ would help.
Those are the two big ones. We're going to continue to look at the profile to see if there's anything else to be found.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: A lot of Service check timed out.
We are already in the last version of NSClient++ 5.2 in almost all servers. Those messages happen because of some servers that don't have the NSClient installed yet.
I think is something related to NRPE. I have set the NRPE command timeout to 50 seconds and I see only "NRPE timeout error" on Nagios event log.
Support edit: Shared 1567171962-eventlog.pdf and mysqld.log with team.
I think is something related to NRPE. I have set the NRPE command timeout to 50 seconds and I see only "NRPE timeout error" on Nagios event log.
Support edit: Shared 1567171962-eventlog.pdf and mysqld.log with team.