Page 1 of 1
Execution exceeded timeout when DF command slow
Posted: Thu Sep 16, 2021 5:33 am
by vappukuttan
Hello,
We see a lot of service checks on ncpa return "UNKNOWN: Execution exceeded timeout threshold of 60s" on linux hosts when the "df" command hangs for any reason. We sometimes have the "df" hang when the NFS mount points are not responding. During that time the service checks flap between OK and UNKNOWN.
Why does all the services get impacted due to that. I would expect a few like check-disk or something get impacted.
Is there a flag or something that controls this.
Thank you,
Re: Execution exceeded timeout when DF command slow
Posted: Thu Sep 16, 2021 11:32 am
by benjaminsmith
Hi
@vappukuttan,
If you have checks that take longer than normal, the recommended solution would be to configure the check as passive. Using this approach the checks will run locally and send the results back to Nagios XI on a scheduled basis. This will help free up resources on the XI server-side, which is likely the cause of the impact on the other checks.
If you have NCPA installed on the remote, setting passive checks is straigtforward, and the guide below covers all the steps required. Otherwise, you can try to increase the timeouts, but we normally don't recommend increasing the global timeout in Nagios Core past 60 seconds.
Using NCPA For Passive Checks
Reference:
https://assets.nagios.com/downloads/nag ... hecks.html
Re: Execution exceeded timeout when DF command slow
Posted: Sun Sep 19, 2021 10:17 am
by vappukuttan
Hello,
Thank you for your response. I dont think the issue is with active or passive checks. The active checks work fine all the time.. except when on a linux host the "df" command stalls for any reason. The reasons it typically would stall is if you have a "NFS" type device mounted and its not reachable at the time. During this scenario, all the checks (regardless of if its a simple memory check, system load check, or anything else), would start flapping between ok and unknown (timed out after 60s).
Hope that helps,
Re: Execution exceeded timeout when DF command slow
Posted: Mon Sep 20, 2021 10:34 am
by ssax
Is the df command slow because of NFS mount connectivity issues or because the system has load/IO wait/storage performance issues?
It's strange that even memory checks would fail unless the disk checks services you are calling are causing the other checks to hang while it's checking the disks because of stale NFS mounts? Can you try disabling the disk check services and see if that allows it to continue working?
What commands are you running to check the disks on the system from the services?
Are you checking through NRPE, NCPA, or something else?
Re: Execution exceeded timeout when DF command slow
Posted: Tue Sep 28, 2021 5:26 pm
by vappukuttan
Sorry, forgot to log back into the forum to check updates.
- DF command becomes unresponsive because of NFS mount connectivity issues. (and has nothing to do with Nagios)
- we use NCPA to run checks
- Disks was just an example, but we use check_disk the standard plugin. But a lot of the standard plugins act up and timeout.
To get NCPA check from timing out, I have typically removed the NFS mount and got around it, till the NFS issue was resolved.
If it happens again or if I can force it.. I will update more information to this ticket.
Thank you,
Re: Execution exceeded timeout when DF command slow
Posted: Wed Sep 29, 2021 9:15 am
by benjaminsmith
Hi
@avappukuttan,
Sounds good. We'll keep this open, and appreciate your update on the situation.
Re: Execution exceeded timeout when DF command slow
Posted: Sat Oct 23, 2021 9:48 pm
by vappukuttan
Hello,
Just wanted to say, that I have continued to see this issue..
I also found an old article with a similar bug.
https://github.com/NagiosEnterprises/ncpa/issues/368
Will see if I can attach a screenshot of what I see within Nagios XI
Thank you..
Re: Execution exceeded timeout when DF command slow
Posted: Mon Oct 25, 2021 11:59 am
by ssax
The NFS mount hang is a known issue, NCPA uses the psutil python library, please see the comment at the bottom here:
https://github.com/giampaolo/psutil/issues/1931
It's why the df command is hanging as well.
There is currently a open bug for this here:
https://github.com/NagiosEnterprises/ncpa/issues/757
Re: Execution exceeded timeout when DF command slow
Posted: Tue Nov 02, 2021 12:39 pm
by vappukuttan
Thank you for the update. I didnt realize this was still an issue. Is there any timeline on if this issue has a resolution or when it will be worked on.
Thank you,
Vinod
Re: Execution exceeded timeout when DF command slow
Posted: Wed Nov 03, 2021 12:55 pm
by ssax
I'm unable to give an ETA at this time.
Please add a comment to the bug report that you're also experiencing the issue on your system so development sees it:
https://github.com/NagiosEnterprises/ncpa/issues/757