Nagios Sizing and Polling Interval
Posted: Mon Apr 20, 2020 3:38 am
Hi,
Currenty we have two nagios environments both running with Nagios xi 5.6.6 on centos 7.
Servers have 8 CPU and 32 GB ram. Each environment is monitoring 1100 servers with 5000 to 6000 checks.
Configured in HA with DRBD, no RAMDISK and SQL is not offloaded.
Currently there are lot of service checks which are with one minute polling interval.
We are planning to roll out new monitoring templates with 25 services per host
There will be an addition of 1000 more severs, so it will be around 2000 servers with average of 50 k services.
Can you please clarify on the below points ?
1) How to check and confirm if nagios is struggling with high servicecheck load.
2) Is it advisable to have one third of service checks with one minute polling ( current CPU, Memory DISK Swap all have 1 min polling) . ?
3) Can the existing setup handle, when the monitored servers and service checks doubles in number, 2000 plus servers and above 50K services ?
4) How far can we push the current setup before adding new instances ?
5) Is it recommended to have 1 min polling for performance metrics such as CPU, Memoy, Swap and Disk ?
Thanks
Currenty we have two nagios environments both running with Nagios xi 5.6.6 on centos 7.
Servers have 8 CPU and 32 GB ram. Each environment is monitoring 1100 servers with 5000 to 6000 checks.
Configured in HA with DRBD, no RAMDISK and SQL is not offloaded.
Currently there are lot of service checks which are with one minute polling interval.
We are planning to roll out new monitoring templates with 25 services per host
There will be an addition of 1000 more severs, so it will be around 2000 servers with average of 50 k services.
Can you please clarify on the below points ?
1) How to check and confirm if nagios is struggling with high servicecheck load.
2) Is it advisable to have one third of service checks with one minute polling ( current CPU, Memory DISK Swap all have 1 min polling) . ?
3) Can the existing setup handle, when the monitored servers and service checks doubles in number, 2000 plus servers and above 50K services ?
4) How far can we push the current setup before adding new instances ?
5) Is it recommended to have 1 min polling for performance metrics such as CPU, Memoy, Swap and Disk ?
Thanks