Service check timeouts after upgrading to NRPE 2.16RC2

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
mdusanapudi
Posts: 35
Joined: Tue Mar 15, 2016 1:31 pm

Service check timeouts after upgrading to NRPE 2.16RC2

Post by mdusanapudi »

Hi ,
I have upgraded nrpe agent on nagios server from 2.15 to 2.16RC2 to handle larger output of service checks.After doing this I am getting service check timeouts on Lot of AIX servers running NRPE V2.12 and HP UX NRPE V2.0 .
I posted similar thread for HP UX 2.0 but got the response that V2.0 really old.But NRPE V2.12 is also getting service check timeouts.

Both AIX,HP UX we sare using pre compiled Binaries.Thanks in advance.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by rkennedy »

Are these systems at different sites? It could be taking a while for your plugins to run.

Can you try appending -t 120 and running the checks over the CLI?
Former Nagios Employee
mdusanapudi
Posts: 35
Joined: Tue Mar 15, 2016 1:31 pm

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by mdusanapudi »

If that's the case same should happen while using NRPE V2.15 on my nagios server,we are seeing any issues while using it.
Moreover as soon as I switched back from V2.16RC2 to V2.15 all the timeouts disappeared.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by rkennedy »

As I asked previously, are these systems at different sites?

I imagine that since the data limit has gone up in 2.16RC there is possibly much more data to send depending on what plugin you're using. When you're using 2.15 you're still not getting all of your data.

Please try to extend the timeout to a higher value, and let us know if that helps on 2.16RC.
Former Nagios Employee
mdusanapudi
Posts: 35
Joined: Tue Mar 15, 2016 1:31 pm

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by mdusanapudi »

I tried increasing the timeout to 120 sec but no luck in getting the check results.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by rkennedy »

Can you show us the full input / output, as well as the plugin you're attempting to use with NRPE as well?
Former Nagios Employee
mdusanapudi
Posts: 35
Joined: Tue Mar 15, 2016 1:31 pm

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by mdusanapudi »

Since we are getting large volume of events because of the upgrade I switched back to NRPE V2.15 on the XI server.
Adding to the last update we are getting time outs for Linux servers as well which are running V2.15.When tested from command line it just works fine with 30 sec timeout option.


All these checks are basic checks like cpu,load,memory etc....

Also Load on the nagios server hitting huge values like 100 etc....so I cnat afford to do an upgrade until I know the root cause of this issue.


Any help is appreciated.
jfrickson

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by jfrickson »

mdusanapudi wrote:Since we are getting large volume of events because of the upgrade I switched back to NRPE V2.15 on the XI server.
Adding to the last update we are getting time outs for Linux servers as well which are running V2.15.When tested from command line it just works fine with 30 sec timeout option.

All these checks are basic checks like cpu,load,memory etc....

Also Load on the nagios server hitting huge values like 100 etc....so I cnat afford to do an upgrade until I know the root cause of this issue.

Any help is appreciated.
So, the timeouts are not occurring because of nrpe 2.16 on the remote machines, but because of the load on the XI server? Did you happen to notice which process(es) had high CPU usage? I'm assuming it's one or more check_nrpe's. Are they all using up the CPU, or is it just one, or a couple of them? If only one or a couple, what's the command line?
mdusanapudi
Posts: 35
Joined: Tue Mar 15, 2016 1:31 pm

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by mdusanapudi »

Sorry for my late response,
Below are my findings on time outs

1.All Checks which are suing NRPE (Linux ,AIX,HP UX) are getting timeout gradually for all the checks.
CPU Utilization
Memory
Processes
Filesystem
Paging
Disk etc...

2.Dont see high load on the server ,actually its same as when nrpe 2.15 was running


3.All the checks mentioned above running smoothly while doing the testing from CLI as well as service check test in GUI.

4.NRPE was compiled with command ./configure --enable-command-args

5.All the Lines you mentioned in your previous response were commented out.


Let me know if you need any other information.

4.NRPE
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Service check timeouts after upgrading to NRPE 2.16RC2

Post by lmiltchev »

Can you show us the service definition of one of the "failing" checks on the Nagios XI server (and the command definition on the remote box) ?
Also, show the actual command run from the command line along with the output of it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked