Hi Team,
We are working on nagiosxi project from last 2 months and we have enterprise edition, now we configured 1300+ hosts and 7000+ active services, in this number of hosts and services we are facing lots of the performance kind of issues, every time "Monitoring Engine Event Queue" automatically get stopped. Whenever we are checking "Server Statistics" its showing I/O wait more than 15%, CPU-> user=30%, system=30%, idel is less then 45%. and host and service check latency is too high (more than 60%).
We planned like 4000 devices with 40,000 services need to be monitoring on this single NagiosXI server with offloaded db on another system.
we are configuring all service with 5 min interval check.
VM machines details where we installed nagiosxi and mysql(offloaded) db.
each machine contained:
32 GB RAM
16 CPU with 2 cores each
200 GB HDD
1. "Monitoring Engine Event Queue" automatically stopped
2. CPU stats usage more than 40%
3. I/O wait going more than 15%
4. Host and service check latency is too high
5. Application is not performing well, its taking too much time to open reports and other tabs.
some time cpu idle is less than 15%....
Please let us know all above point and give suggestions for 4000 devices and 40,000 checks ( 15,000 active checks and 25,000 passive checks)
Please check the attached screenshots ..
Thanks in advanced.
NagiosXI performance issue
NagiosXI performance issue
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
Can you run the following command on your nagios server, then stop it and copy all of the text off of your ssh session to a reply here? OR take a screenshot:
How often are you checking your active services, do you have freshness checking enabled?
Code: Select all
TOPRe: NagiosXI performance issue
HI Team,
Please find the below atatchment of "TOP" command and we are maintaining "5" minute check interval is maintained for all service checks.
And also find the attachment for freshness,, which is disabled for host and enabled for service
And also let us know is there any issue with IOPS.. because iowait is going sometimes to 18% and more.. let us know how many IOPS required for above configuration and service checks...
Please find the below atatchment of "TOP" command and we are maintaining "5" minute check interval is maintained for all service checks.
And also find the attachment for freshness,, which is disabled for host and enabled for service
And also let us know is there any issue with IOPS.. because iowait is going sometimes to 18% and more.. let us know how many IOPS required for above configuration and service checks...
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
Do you know what your load average was before you stopped the TOP? It looks like your load is probably spiking. Do you have freshness on your individual service checks? It should be defined in the nagios.cfg by default, this does not mean you are using it on your services. How rapidly are you checking your network devices? And do they specifically have freshness checking turned on in their configs, it looks like MRTG is taking up the bull load on your CPU.
Re: NagiosXI performance issue
HI Team,
For Network Devices, we are checking the interval every "6" minutes and nearly 350 network devices configured with nearly 4000 active service checks polling for every 6 minutes.. Load average is increasing suddenly and showing like 36,25,23 for 1 min,5min and 15min and reducing back, showing spikes sometimes and even service and host latency check is going upto "60" in an average.
After configuring network devices, we are experiencing performance issues and this is may be due to MRTG ...
And please explain me what exactly the freshness interval and where we need to enable this parameter and what should be the value of the same...
I have gone through some docs in nagios ... explaining about "rrdcache" and use of RAM Disk... suggest us whether this is useful in my environment..??
Thanks
For Network Devices, we are checking the interval every "6" minutes and nearly 350 network devices configured with nearly 4000 active service checks polling for every 6 minutes.. Load average is increasing suddenly and showing like 36,25,23 for 1 min,5min and 15min and reducing back, showing spikes sometimes and even service and host latency check is going upto "60" in an average.
After configuring network devices, we are experiencing performance issues and this is may be due to MRTG ...
And please explain me what exactly the freshness interval and where we need to enable this parameter and what should be the value of the same...
I have gone through some docs in nagios ... explaining about "rrdcache" and use of RAM Disk... suggest us whether this is useful in my environment..??
Thanks
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
A ramdisk would help, also using modgearman would decrease your load. I was checking to see if you had freshness enabled on your services, you do not want to enable it at this time, what it will do is check your services much more rapidly and is fairly dangerous to use on active checks.
http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
Re: NagiosXI performance issue
Hi Team,
thanks for replying, we need some more clarification on some of points..
how many pollers (modgearman) we need to install for distributed monitoring.
what should be the Hardware configuration required for 4000 hosts with 40,000 checks. we have already offloaded db
we are configuring 15,000 active checks for network devices and 25,000 passive checks.
thanks for replying, we need some more clarification on some of points..
how many pollers (modgearman) we need to install for distributed monitoring.
what should be the Hardware configuration required for 4000 hosts with 40,000 checks. we have already offloaded db
we are configuring 15,000 active checks for network devices and 25,000 passive checks.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
Is your offloaded DB on a shared network drive for storage? That can cause latency as well. Hardware wise you would want to use between 4 and 8 cores, 16 GB of memory or more, and a ample amount of HDD space for logging and performance data. You can start with the basic modgearman setup and always add more workers later on. I would start by using it on your local nagios xi server, and then using remote workers once you have it all set up.
Re: NagiosXI performance issue
Hi team,
Our offloaded db on another vm machine, all machines are on vmware virtual environment. is it causing latency ?
how many IOPS required for nagios xi and offloaded db on virtual environment for this much hosts and services?
every time "Monitoring Engine Event Queue" automatically stopped , what is the reason behind this, every time we need to start it manualy.
Our offloaded db on another vm machine, all machines are on vmware virtual environment. is it causing latency ?
how many IOPS required for nagios xi and offloaded db on virtual environment for this much hosts and services?
every time "Monitoring Engine Event Queue" automatically stopped , what is the reason behind this, every time we need to start it manualy.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
Well an offloaded DB would not cause your load to spike, at least not that much. Have you integrated mod_gearman yet? This should have a large impact.