Hello,
We see a lot of service checks on ncpa return "UNKNOWN: Execution exceeded timeout threshold of 60s" on linux hosts when the "df" command hangs for any reason. We sometimes have the "df" hang when the NFS mount points are not responding. During that time the service checks flap between OK and UNKNOWN.
Why does all the services get impacted due to that. I would expect a few like check-disk or something get impacted.
Is there a flag or something that controls this.
Thank you,
Execution exceeded timeout when DF command slow
-
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
-
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Execution exceeded timeout when DF command slow
Hi @vappukuttan,
If you have checks that take longer than normal, the recommended solution would be to configure the check as passive. Using this approach the checks will run locally and send the results back to Nagios XI on a scheduled basis. This will help free up resources on the XI server-side, which is likely the cause of the impact on the other checks.
If you have NCPA installed on the remote, setting passive checks is straigtforward, and the guide below covers all the steps required. Otherwise, you can try to increase the timeouts, but we normally don't recommend increasing the global timeout in Nagios Core past 60 seconds.
Using NCPA For Passive Checks
Reference:
https://assets.nagios.com/downloads/nag ... hecks.html
If you have checks that take longer than normal, the recommended solution would be to configure the check as passive. Using this approach the checks will run locally and send the results back to Nagios XI on a scheduled basis. This will help free up resources on the XI server-side, which is likely the cause of the impact on the other checks.
If you have NCPA installed on the remote, setting passive checks is straigtforward, and the guide below covers all the steps required. Otherwise, you can try to increase the timeouts, but we normally don't recommend increasing the global timeout in Nagios Core past 60 seconds.
Using NCPA For Passive Checks
Reference:
https://assets.nagios.com/downloads/nag ... hecks.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Execution exceeded timeout when DF command slow
Hello,
Thank you for your response. I dont think the issue is with active or passive checks. The active checks work fine all the time.. except when on a linux host the "df" command stalls for any reason. The reasons it typically would stall is if you have a "NFS" type device mounted and its not reachable at the time. During this scenario, all the checks (regardless of if its a simple memory check, system load check, or anything else), would start flapping between ok and unknown (timed out after 60s).
Hope that helps,
Thank you for your response. I dont think the issue is with active or passive checks. The active checks work fine all the time.. except when on a linux host the "df" command stalls for any reason. The reasons it typically would stall is if you have a "NFS" type device mounted and its not reachable at the time. During this scenario, all the checks (regardless of if its a simple memory check, system load check, or anything else), would start flapping between ok and unknown (timed out after 60s).
Hope that helps,
Re: Execution exceeded timeout when DF command slow
Is the df command slow because of NFS mount connectivity issues or because the system has load/IO wait/storage performance issues?
It's strange that even memory checks would fail unless the disk checks services you are calling are causing the other checks to hang while it's checking the disks because of stale NFS mounts? Can you try disabling the disk check services and see if that allows it to continue working?
What commands are you running to check the disks on the system from the services?
Are you checking through NRPE, NCPA, or something else?
It's strange that even memory checks would fail unless the disk checks services you are calling are causing the other checks to hang while it's checking the disks because of stale NFS mounts? Can you try disabling the disk check services and see if that allows it to continue working?
What commands are you running to check the disks on the system from the services?
Are you checking through NRPE, NCPA, or something else?
-
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Execution exceeded timeout when DF command slow
Sorry, forgot to log back into the forum to check updates.
- DF command becomes unresponsive because of NFS mount connectivity issues. (and has nothing to do with Nagios)
- we use NCPA to run checks
- Disks was just an example, but we use check_disk the standard plugin. But a lot of the standard plugins act up and timeout.
To get NCPA check from timing out, I have typically removed the NFS mount and got around it, till the NFS issue was resolved.
If it happens again or if I can force it.. I will update more information to this ticket.
Thank you,
- DF command becomes unresponsive because of NFS mount connectivity issues. (and has nothing to do with Nagios)
- we use NCPA to run checks
- Disks was just an example, but we use check_disk the standard plugin. But a lot of the standard plugins act up and timeout.
To get NCPA check from timing out, I have typically removed the NFS mount and got around it, till the NFS issue was resolved.
If it happens again or if I can force it.. I will update more information to this ticket.
Thank you,
-
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Execution exceeded timeout when DF command slow
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Execution exceeded timeout when DF command slow
Hello,
Just wanted to say, that I have continued to see this issue..
I also found an old article with a similar bug.
https://github.com/NagiosEnterprises/ncpa/issues/368
Will see if I can attach a screenshot of what I see within Nagios XI
Thank you..
Just wanted to say, that I have continued to see this issue..
I also found an old article with a similar bug.
https://github.com/NagiosEnterprises/ncpa/issues/368
Will see if I can attach a screenshot of what I see within Nagios XI
Thank you..
You do not have the required permissions to view the files attached to this post.
Re: Execution exceeded timeout when DF command slow
The NFS mount hang is a known issue, NCPA uses the psutil python library, please see the comment at the bottom here:
https://github.com/giampaolo/psutil/issues/1931
It's why the df command is hanging as well.
There is currently a open bug for this here:
https://github.com/NagiosEnterprises/ncpa/issues/757
https://github.com/giampaolo/psutil/issues/1931
It's why the df command is hanging as well.
There is currently a open bug for this here:
https://github.com/NagiosEnterprises/ncpa/issues/757
-
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Execution exceeded timeout when DF command slow
Thank you for the update. I didnt realize this was still an issue. Is there any timeline on if this issue has a resolution or when it will be worked on.
Thank you,
Vinod
Thank you,
Vinod
Re: Execution exceeded timeout when DF command slow
I'm unable to give an ETA at this time.
Please add a comment to the bug report that you're also experiencing the issue on your system so development sees it:
https://github.com/NagiosEnterprises/ncpa/issues/757
Please add a comment to the bug report that you're also experiencing the issue on your system so development sees it:
https://github.com/NagiosEnterprises/ncpa/issues/757