Hi,
Currenty we have two nagios environments both running with Nagios xi 5.6.6 on centos 7.
Servers have 8 CPU and 32 GB ram. Each environment is monitoring 1100 servers with 5000 to 6000 checks.
Configured in HA with DRBD, no RAMDISK and SQL is not offloaded.
Currently there are lot of service checks which are with one minute polling interval.
We are planning to roll out new monitoring templates with 25 services per host
There will be an addition of 1000 more severs, so it will be around 2000 servers with average of 50 k services.
Can you please clarify on the below points ?
1) How to check and confirm if nagios is struggling with high servicecheck load.
2) Is it advisable to have one third of service checks with one minute polling ( current CPU, Memory DISK Swap all have 1 min polling) . ?
3) Can the existing setup handle, when the monitored servers and service checks doubles in number, 2000 plus servers and above 50K services ?
4) How far can we push the current setup before adding new instances ?
5) Is it recommended to have 1 min polling for performance metrics such as CPU, Memoy, Swap and Disk ?
Thanks
Nagios Sizing and Polling Interval
Re: Nagios Sizing and Polling Interval
The clearest indicators of problems are "ipcs -a", and if you have thousands of items in the queue on a regular basis, that's one sign. Also, look at "top", and if the load average is high, and memory free is low, that's another indication. And if you look in /var/log/messages and see:1) How to check and confirm if nagios is struggling with high servicecheck load.
Code: Select all
ndo2db: Error: queue recv error.
ndo2db: Error: max retries exceeded sending message to queue.
ndo2db: Warning: queue send error, retrying...
sdf2) Is it advisable to have one third of service checks with one minute polling ( current CPU, Memory DISK Swap all have 1 min polling) . ?
...3) Can the existing setup handle, when the monitored servers and service checks doubles in number, 2000 plus servers and above 50K services ?
You might be running pretty close to some limits already. I definitely suggest setting up a ramdisk, making sure your other disks are as fast as they can be, and setting up mod-gearman. Here are some links to documents for those:4) How far can we push the current setup before adding new instances ?
https://assets.nagios.com/downloads/nag ... ios_XI.pdf
https://assets.nagios.com/downloads/nag ... giosXI.pdf
5) Is it recommended to have 1 min polling for performance metrics such as CPU, Memoy, Swap and Disk ?
Not in your case. With as much stuff as you're monitoring, I suggest saving the 1-minute interval for only your most critical machines and checks, and have the rest at five minutes.
I hope that helps you get started! Let us know if you have more questions.
--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!