Nagios Server Performance

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
MSPk
Posts: 317
Joined: Fri Aug 24, 2012 12:03 am

Nagios Server Performance

Post by MSPk »

Hi,

We are implementing a new nagios instance for a client and have currently configured close to 2500 service check in addition to 420+ hosts.

Our server Configuration:

Red Hat Enterprise Linux 5 -Virtual Server
Processor Qty: 4,
Processor Speed: 2.66 GHz,
Total Memory: 8GB,
HD Capacity: 128GB, HD Config: "/"46 GB + "/apps"64GB

we are monitoring about 110 servers using nsclient to monitor local parameters like CPU, memory, services, page file, etc.. and about 300 websites for DNS resolution, HTTP & HTTPS ports and ping services. (polling intervals range from 5-15 mins)

My cpu Load is hovering around 3-4 for 1,5,15 mins. however when I look at the monitoring info, service check latency is more than 2 sec at the moment. Please suggest if this is in acceptable limits for a small environment like this.

Also, I cant see the monitoring Engine Event Queue graph currently on the admin page. Is there any location where we can download this graph and also get the historical data for the "Monitoring Engine Check Statistics" and "Monitoring Engine Performance".
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios Server Performance

Post by abrist »

MSPk wrote:My cpu Load is hovering around 3-4 for 1,5,15 mins. however when I look at the monitoring info, service check latency is more than 2 sec at the moment. Please suggest if this is in acceptable limits for a small environment like this.
With 4 cpus, a load average of 3-4 will is *acceptable* as it will not cause cpu wait, but is definitely higher than expected for the number of checks you have set up. What are the intervals on these checks? How many checks every 5 minutes?
MSPk wrote:Also, I cant see the monitoring Engine Event Queue graph currently on the admin page. Is there any location where we can download this graph and also get the historical data for the "Monitoring Engine Check Statistics" and "Monitoring Engine Performance".
The evnt queue graph requires internet access as it relies on the google's chart api. If you have internet access, but it is not working, it may be proxy related or security policies in your environment do not trust chart.googleapis.com.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
MSPk
Posts: 317
Joined: Fri Aug 24, 2012 12:03 am

Re: Nagios Server Performance

Post by MSPk »

there are 1650 services checks and 300 host checks with polling intervel set as 5mins and about 900 service checks and 15min polling interval.
MSPk
Posts: 317
Joined: Fri Aug 24, 2012 12:03 am

Re: Nagios Server Performance

Post by MSPk »

From the monitoring engines statistics page the average checks in 5mins ranges from 1800 to 2100 checks
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios Server Performance

Post by abrist »

with those numbers, your 4 core system is just about on par with our suggested hardware requirements doc:
http://assets.nagios.com/downloads/nagi ... ements.pdf
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
MSPk
Posts: 317
Joined: Fri Aug 24, 2012 12:03 am

Re: Nagios Server Performance

Post by MSPk »

I agree that the h/w recommedation go with 2.5k services + 250 host checks but something looks to be out of shape. we have implemented monitoring for around 300+ server and 100 network elements recently and even now the load is always around 1-1.5. for the currrent setup I see the CPU load going up to 7-8 some times and will reach 14-17 when there are any config changes made on the system. are their any checks to see that every thing is indeed fine?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios Server Performance

Post by abrist »

Check the performance --> Home --Performance. How many service and host checks every 5 minutes? You will see the load spike when you apply config as it will use max resources.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Nagios Server Performance

Post by Box293 »

Here is a suggestion that you may consider (if you haven't already).

Sometimes it's very easy to run through all the wizards and configure the default check interval as 5 minutes. What can occurr is a lot of checks are being scheduled to run at the same time and then at other times the XI host goes quiet.

Perhaps look at setting some checks to run at different intervals. For example some checks you may still want them to occur frequently, so instead of 5 minutes you could use 4 minutes or 6 minutes. This shifts the scheduling for these checks to occur at a more random frequency.

Also there may be some checks that aren't really needed to be performed every five minutes. Perhaps DNS resolution is something that might only get checked every 19 minutes for example.

I also use Nagios XI for auditing to ensure standards we have enforced remain in place, or for data gathering purposes. For a lot of these checks I schedule to check ever 1200 minutes (20 hours).

I hope this is of some help to you.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios Server Performance

Post by abrist »

Box293 wrote:Perhaps look at setting some checks to run at different intervals.
This. Some checks like disk space checks, among others, do not need to be run every 5 minutes. Choose your intervals wisely if you are on a large install.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
MSPk
Posts: 317
Joined: Fri Aug 24, 2012 12:03 am

Re: Nagios Server Performance

Post by MSPk »

Yes, I have already fine tuned the polling intervals and the thresholds for different zone (dev beng chked every 15 mins while prod in every 5 mins). I believe there is some issue with the php files (dashboards) and MySQL instance. every time we login the intial screen the load is around 3-4, once we start using the dashboards and try to extract reports thats when the cpu load spikes eratically. MySQLd takes 90-100% CPU whenever we click on the notificatifications or try to extract the notifications report, It jus crashes without showing any info.
Locked