So I've noticed some spikes in latency for some of our hosts and rtmax spikes up to 150ms. After investigation, it is actually the Nagios XI server that is causing these "spikes". When checking ping times from other servers, it seems to be quite stable, but at the same time a ping session from the Nagios XI server can show some fluctuations of ping times. I suspect this could be due to load and by adding more CPU and RAM, it does seem to be better, but it isn't solid.
So my question is, how reliable is the host check data of rta and rtmax? If this is derived from what the Nagios XI is getting based on the load of the XI server? Anyone else have this issue? Just trying to get some more reliable data.
Host Checks / Latency
Host Checks / Latency
You do not have the required permissions to view the files attached to this post.
Re: Host Checks / Latency
Well, if the XI server is under pretty heavy load, then you could see an increase delay in responses so I feel this all lines up proportionately.
To help us get a better idea - what kind of resources did you have before / after, and how many host / service checks are you running?
To help us get a better idea - what kind of resources did you have before / after, and how many host / service checks are you running?
Former Nagios Employee
Re: Host Checks / Latency
This instance has one gearman worker. We had 2CPU and 8GB of RAM for the server before. It is now on 4CPU with 16GB of RAM. According to CCM, we have 221 hosts and 4176 services. Not too much.
We have another instance with about 1500 hosts and about 24065 service counts that's using 6CPU with 20GB of RAM, but that also has 3 mod gearman workers. I am seeing some latencies but it is much lower. I guess the best way to handle this is to spin up an additional gearman worker node and have that only handle host checks to get better latency results.
We have another instance with about 1500 hosts and about 24065 service counts that's using 6CPU with 20GB of RAM, but that also has 3 mod gearman workers. I am seeing some latencies but it is much lower. I guess the best way to handle this is to spin up an additional gearman worker node and have that only handle host checks to get better latency results.
Re: Host Checks / Latency
Not much at all.
Gearman may be the solution. When the system only had 2CPU/8G, were you noticing any throttles anywhere (CPU/RAM)? Feel free to PM over a profile, and I can take a look to see if anything is standing out that would cause this. I may not be able to find much though, since you've already added the resources.
Another option to throw at you, is to use check_by_ssh or check_nrpe as agents. Run these on a 'cloud' service, and you'd be able to get additional information about the latency from an external source.
Gearman may be the solution. When the system only had 2CPU/8G, were you noticing any throttles anywhere (CPU/RAM)? Feel free to PM over a profile, and I can take a look to see if anything is standing out that would cause this. I may not be able to find much though, since you've already added the resources.
Another option to throw at you, is to use check_by_ssh or check_nrpe as agents. Run these on a 'cloud' service, and you'd be able to get additional information about the latency from an external source.
Former Nagios Employee