Execution exceeded timeout when DF command slow

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Execution exceeded timeout when DF command slow

Post by vappukuttan »

Hello,

We see a lot of service checks on ncpa return "UNKNOWN: Execution exceeded timeout threshold of 60s" on linux hosts when the "df" command hangs for any reason. We sometimes have the "df" hang when the NFS mount points are not responding. During that time the service checks flap between OK and UNKNOWN.

Why does all the services get impacted due to that. I would expect a few like check-disk or something get impacted.

Is there a flag or something that controls this.

Thank you,
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Execution exceeded timeout when DF command slow

Post by benjaminsmith »

Hi @vappukuttan,

If you have checks that take longer than normal, the recommended solution would be to configure the check as passive. Using this approach the checks will run locally and send the results back to Nagios XI on a scheduled basis. This will help free up resources on the XI server-side, which is likely the cause of the impact on the other checks.

If you have NCPA installed on the remote, setting passive checks is straigtforward, and the guide below covers all the steps required. Otherwise, you can try to increase the timeouts, but we normally don't recommend increasing the global timeout in Nagios Core past 60 seconds.

Using NCPA For Passive Checks

Reference:
https://assets.nagios.com/downloads/nag ... hecks.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Execution exceeded timeout when DF command slow

Post by vappukuttan »

Hello,

Thank you for your response. I dont think the issue is with active or passive checks. The active checks work fine all the time.. except when on a linux host the "df" command stalls for any reason. The reasons it typically would stall is if you have a "NFS" type device mounted and its not reachable at the time. During this scenario, all the checks (regardless of if its a simple memory check, system load check, or anything else), would start flapping between ok and unknown (timed out after 60s).

Hope that helps,
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Execution exceeded timeout when DF command slow

Post by ssax »

Is the df command slow because of NFS mount connectivity issues or because the system has load/IO wait/storage performance issues?

It's strange that even memory checks would fail unless the disk checks services you are calling are causing the other checks to hang while it's checking the disks because of stale NFS mounts? Can you try disabling the disk check services and see if that allows it to continue working?

What commands are you running to check the disks on the system from the services?

Are you checking through NRPE, NCPA, or something else?
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Execution exceeded timeout when DF command slow

Post by vappukuttan »

Sorry, forgot to log back into the forum to check updates.

- DF command becomes unresponsive because of NFS mount connectivity issues. (and has nothing to do with Nagios)
- we use NCPA to run checks
- Disks was just an example, but we use check_disk the standard plugin. But a lot of the standard plugins act up and timeout.

To get NCPA check from timing out, I have typically removed the NFS mount and got around it, till the NFS issue was resolved.
If it happens again or if I can force it.. I will update more information to this ticket.

Thank you,
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Execution exceeded timeout when DF command slow

Post by benjaminsmith »

Hi @avappukuttan,

Sounds good. We'll keep this open, and appreciate your update on the situation.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Execution exceeded timeout when DF command slow

Post by vappukuttan »

Hello,

Just wanted to say, that I have continued to see this issue..

I also found an old article with a similar bug.
https://github.com/NagiosEnterprises/ncpa/issues/368

Will see if I can attach a screenshot of what I see within Nagios XI

Thank you..
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Execution exceeded timeout when DF command slow

Post by ssax »

The NFS mount hang is a known issue, NCPA uses the psutil python library, please see the comment at the bottom here:

https://github.com/giampaolo/psutil/issues/1931

It's why the df command is hanging as well.

There is currently a open bug for this here:

https://github.com/NagiosEnterprises/ncpa/issues/757
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Execution exceeded timeout when DF command slow

Post by vappukuttan »

Thank you for the update. I didnt realize this was still an issue. Is there any timeline on if this issue has a resolution or when it will be worked on.

Thank you,
Vinod
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Execution exceeded timeout when DF command slow

Post by ssax »

I'm unable to give an ETA at this time.

Please add a comment to the bug report that you're also experiencing the issue on your system so development sees it:

https://github.com/NagiosEnterprises/ncpa/issues/757
Locked