Hi ,
I have upgraded nrpe agent on nagios server from 2.15 to 2.16RC2 to handle larger output of service checks.After doing this I am getting service check timeouts on Lot of AIX servers running NRPE V2.12 and HP UX NRPE V2.0 .
I posted similar thread for HP UX 2.0 but got the response that V2.0 really old.But NRPE V2.12 is also getting service check timeouts.
Both AIX,HP UX we sare using pre compiled Binaries.Thanks in advance.
Service check timeouts after upgrading to NRPE 2.16RC2
-
mdusanapudi
- Posts: 35
- Joined: Tue Mar 15, 2016 1:31 pm
Re: Service check timeouts after upgrading to NRPE 2.16RC2
Are these systems at different sites? It could be taking a while for your plugins to run.
Can you try appending -t 120 and running the checks over the CLI?
Can you try appending -t 120 and running the checks over the CLI?
Former Nagios Employee
-
mdusanapudi
- Posts: 35
- Joined: Tue Mar 15, 2016 1:31 pm
Re: Service check timeouts after upgrading to NRPE 2.16RC2
If that's the case same should happen while using NRPE V2.15 on my nagios server,we are seeing any issues while using it.
Moreover as soon as I switched back from V2.16RC2 to V2.15 all the timeouts disappeared.
Moreover as soon as I switched back from V2.16RC2 to V2.15 all the timeouts disappeared.
Re: Service check timeouts after upgrading to NRPE 2.16RC2
As I asked previously, are these systems at different sites?
I imagine that since the data limit has gone up in 2.16RC there is possibly much more data to send depending on what plugin you're using. When you're using 2.15 you're still not getting all of your data.
Please try to extend the timeout to a higher value, and let us know if that helps on 2.16RC.
I imagine that since the data limit has gone up in 2.16RC there is possibly much more data to send depending on what plugin you're using. When you're using 2.15 you're still not getting all of your data.
Please try to extend the timeout to a higher value, and let us know if that helps on 2.16RC.
Former Nagios Employee
-
mdusanapudi
- Posts: 35
- Joined: Tue Mar 15, 2016 1:31 pm
Re: Service check timeouts after upgrading to NRPE 2.16RC2
I tried increasing the timeout to 120 sec but no luck in getting the check results.
Re: Service check timeouts after upgrading to NRPE 2.16RC2
Can you show us the full input / output, as well as the plugin you're attempting to use with NRPE as well?
Former Nagios Employee
-
mdusanapudi
- Posts: 35
- Joined: Tue Mar 15, 2016 1:31 pm
Re: Service check timeouts after upgrading to NRPE 2.16RC2
Since we are getting large volume of events because of the upgrade I switched back to NRPE V2.15 on the XI server.
Adding to the last update we are getting time outs for Linux servers as well which are running V2.15.When tested from command line it just works fine with 30 sec timeout option.
All these checks are basic checks like cpu,load,memory etc....
Also Load on the nagios server hitting huge values like 100 etc....so I cnat afford to do an upgrade until I know the root cause of this issue.
Any help is appreciated.
Adding to the last update we are getting time outs for Linux servers as well which are running V2.15.When tested from command line it just works fine with 30 sec timeout option.
All these checks are basic checks like cpu,load,memory etc....
Also Load on the nagios server hitting huge values like 100 etc....so I cnat afford to do an upgrade until I know the root cause of this issue.
Any help is appreciated.
-
jfrickson
Re: Service check timeouts after upgrading to NRPE 2.16RC2
So, the timeouts are not occurring because of nrpe 2.16 on the remote machines, but because of the load on the XI server? Did you happen to notice which process(es) had high CPU usage? I'm assuming it's one or more check_nrpe's. Are they all using up the CPU, or is it just one, or a couple of them? If only one or a couple, what's the command line?mdusanapudi wrote:Since we are getting large volume of events because of the upgrade I switched back to NRPE V2.15 on the XI server.
Adding to the last update we are getting time outs for Linux servers as well which are running V2.15.When tested from command line it just works fine with 30 sec timeout option.
All these checks are basic checks like cpu,load,memory etc....
Also Load on the nagios server hitting huge values like 100 etc....so I cnat afford to do an upgrade until I know the root cause of this issue.
Any help is appreciated.
-
mdusanapudi
- Posts: 35
- Joined: Tue Mar 15, 2016 1:31 pm
Re: Service check timeouts after upgrading to NRPE 2.16RC2
Sorry for my late response,
Below are my findings on time outs
1.All Checks which are suing NRPE (Linux ,AIX,HP UX) are getting timeout gradually for all the checks.
CPU Utilization
Memory
Processes
Filesystem
Paging
Disk etc...
2.Dont see high load on the server ,actually its same as when nrpe 2.15 was running
3.All the checks mentioned above running smoothly while doing the testing from CLI as well as service check test in GUI.
4.NRPE was compiled with command ./configure --enable-command-args
5.All the Lines you mentioned in your previous response were commented out.
Let me know if you need any other information.
4.NRPE
Below are my findings on time outs
1.All Checks which are suing NRPE (Linux ,AIX,HP UX) are getting timeout gradually for all the checks.
CPU Utilization
Memory
Processes
Filesystem
Paging
Disk etc...
2.Dont see high load on the server ,actually its same as when nrpe 2.15 was running
3.All the checks mentioned above running smoothly while doing the testing from CLI as well as service check test in GUI.
4.NRPE was compiled with command ./configure --enable-command-args
5.All the Lines you mentioned in your previous response were commented out.
Let me know if you need any other information.
4.NRPE
Re: Service check timeouts after upgrading to NRPE 2.16RC2
Can you show us the service definition of one of the "failing" checks on the Nagios XI server (and the command definition on the remote box) ?
Also, show the actual command run from the command line along with the output of it.
Also, show the actual command run from the command line along with the output of it.
Be sure to check out our Knowledgebase for helpful articles and solutions!