A lot of time out errors with NCPA
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
A lot of time out errors with NCPA
We're currently working on replacing all NRPE checks with NCPA checks and for some reason a lot of our Solaris servers are getting a lot of time out errors. There have been 79 time out state changes in the past 24 hours. I'm just curious if there's anything that can be done from the Nagios server side to remediate this.
Re: A lot of time out errors with NCPA
Hello @snapon_admin
Thanks for reaching out, let's start off by increasing the timeout in the config and bounce the ncpa_listener and ncpa_passive service by restarting.
https://support.nagios.com/kb/article/n ... s-872.html
Let us know how things look.
Thanks,
Perry
Thanks for reaching out, let's start off by increasing the timeout in the config and bounce the ncpa_listener and ncpa_passive service by restarting.
https://support.nagios.com/kb/article/n ... s-872.html
Let us know how things look.
Thanks,
Perry
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: A lot of time out errors with NCPA
The NCPA time out is currently 90 seconds. Is it a good idea to increase it beyond that point? We have a fairly busy server and I don't know how harmful it'd be to have several dozen checks waiting 120+ seconds for results.
Re: A lot of time out errors with NCPA
It would be fine to increase it but you are right that it will have an impact on your system if all of the checks are taking that long.
What I recommend is to:
- Set a timeout on all of the commands (if they support it, it depends on the plugin) to a low level like 30 or 60
- Use one-off services when you need a long timeout (meaning separate commands for specific service that require long timeouts)
- If they are really long running checks (minutes) they should be converted to a passive service so as not to impact the other checks
- Make sure the host_check_timeout and service_check_timeout in your /usr/local/nagios/etc/nagios.cfg are longer than your highest timeout
You can also use these to get a better idea of what long running checks you have:
https://exchange.nagios.org/directory/A ... er/details
Or from the CLI:
https://exchange.nagios.org/directory/P ... me/details
What I recommend is to:
- Set a timeout on all of the commands (if they support it, it depends on the plugin) to a low level like 30 or 60
- Use one-off services when you need a long timeout (meaning separate commands for specific service that require long timeouts)
- If they are really long running checks (minutes) they should be converted to a passive service so as not to impact the other checks
- Make sure the host_check_timeout and service_check_timeout in your /usr/local/nagios/etc/nagios.cfg are longer than your highest timeout
You can also use these to get a better idea of what long running checks you have:
https://exchange.nagios.org/directory/A ... er/details
Or from the CLI:
https://exchange.nagios.org/directory/P ... me/details
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: A lot of time out errors with NCPA
We're talking about a LOT of checks here, so this level of granularity might be...difficult to achieve. As an example, when this happens literally EVERY check on a specific server times out and, at least in the case of one of these servers, we're talking about 53 checks on that one server that all time out. We never got these timeouts with NRPE so I'm just not sure why NCPA is having this issue. Would it be possible to setup a remote session so someone could take a better look at what i'm seeing and figure out a solution to this that isn't more of a band-aid?
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: A lot of time out errors with NCPA
I think i'm going to open up a ticket for this issue. It's become a fairly critical issue and I need a btter response on it so I'm hoping a ticket will help with that.