NagiosXI performance issue
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: NagiosXI performance issue
What would you like us to verify, specifically. Asking us to verify graphs does not really give us a good idea of what kind of information you are looking for or looking to validate.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: NagiosXI performance issue
Hi Team,
As you know we are facing the issue, that apply configuration is taking hours and hours. So this are the graphs we have taken from VMWare infra team for cpu, memory, and disk utilization, can you please check NagiOS XI machine graphs and verify the IOPS and latency, due to this much utilization we are facing this issue or its ok?
because we are using shared storage on VMWare for NagiOS XI server and offloaded DB, what is our concern we want to verify this issue is coming from Hardware side or application side?
can you please check the graphs again.
As you know we are facing the issue, that apply configuration is taking hours and hours. So this are the graphs we have taken from VMWare infra team for cpu, memory, and disk utilization, can you please check NagiOS XI machine graphs and verify the IOPS and latency, due to this much utilization we are facing this issue or its ok?
because we are using shared storage on VMWare for NagiOS XI server and offloaded DB, what is our concern we want to verify this issue is coming from Hardware side or application side?
can you please check the graphs again.
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: NagiosXI performance issue
These graphs are over an extended period of time. It does not really give us a good indication of anything specifically other than the load is increasing over that period of time, presumable due to added checks, as would be expected. If you have done nothing in the way of performance enhancements, and this virtual host, san, and offloaded mysql db are using the same resources, especially as other systems. I suppose this would be somewhat expected. However as stated previously or in your other thread, your php.ini specifies a max timeout of aroudn 20 minutes, so I do not know of a way that applying configuration could take hours. As previously requested, what kind of latencey are you seeing to your db, and flat files where the configs and performance data are stored?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: NagiosXI performance issue
HI Team,
Tomorrow we are migrating our entire Nagios Environment to New Storage SAN box which is dedicated to Nagios XI and offloaded DB with 1000 IOPS.. we are guessing the issues are due to the limited number of IOPS and thus it causes high CPU Utilization.
We will update you once we migrate to new SAN Box and let u know the performance..
Thanks for your posts..
Tomorrow we are migrating our entire Nagios Environment to New Storage SAN box which is dedicated to Nagios XI and offloaded DB with 1000 IOPS.. we are guessing the issues are due to the limited number of IOPS and thus it causes high CPU Utilization.
We will update you once we migrate to new SAN Box and let u know the performance..
Thanks for your posts..
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Re: NagiosXI performance issue
Great. I am highly interested in your findings and results.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: NagiosXI performance issue
HI Team,
We got new storage and now iowait is showing less than 2% and no issues with iowait, but load is showing more in Server. So, we configured 2 mod gearman workers, one is in local Nagios Server maintaining Network Devices and one is in remote worker which is maintaining VMware Devices. Load is reduced and showing 10 for 1 minute.
But in monitoring engine, the average service check latency is showing above 100 and going beyond some times. Due to this, graph is going spikes and after restart the nagios, it is working fine. Please let me know the permanent solution for that.
In process_perfdata.cfg file, i increased timeout value to "60", but in logs, it is showing in npcd.log file like,
[10-29-2013 11:33:12] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026468.perfdata.service'
[10-29-2013 11:36:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:36:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026648.perfdata.service'
[10-29-2013 11:37:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:37:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026678.perfdata.service'
and verified ram disk configurations, everything showing fine as per the document, but still the problem persists.
In process_perfdata.log,
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Timeout after 60 secs. ***
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Please check your npcd.cfg
Please let me know regarding the above errors and also im attaching my "nagios.cfg" and mod_gearman neb and worker file for your reference.. Let me know if any changes need to be done from my end in config files..
Thanks in Advance
We got new storage and now iowait is showing less than 2% and no issues with iowait, but load is showing more in Server. So, we configured 2 mod gearman workers, one is in local Nagios Server maintaining Network Devices and one is in remote worker which is maintaining VMware Devices. Load is reduced and showing 10 for 1 minute.
But in monitoring engine, the average service check latency is showing above 100 and going beyond some times. Due to this, graph is going spikes and after restart the nagios, it is working fine. Please let me know the permanent solution for that.
In process_perfdata.cfg file, i increased timeout value to "60", but in logs, it is showing in npcd.log file like,
[10-29-2013 11:33:12] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026468.perfdata.service'
[10-29-2013 11:36:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:36:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026648.perfdata.service'
[10-29-2013 11:37:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:37:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026678.perfdata.service'
and verified ram disk configurations, everything showing fine as per the document, but still the problem persists.
In process_perfdata.log,
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Timeout after 60 secs. ***
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Please check your npcd.cfg
Please let me know regarding the above errors and also im attaching my "nagios.cfg" and mod_gearman neb and worker file for your reference.. Let me know if any changes need to be done from my end in config files..
Thanks in Advance
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Re: NagiosXI performance issue
we have 2000 devices and 14000 services
In 2000 devices
Esxi 400 and 4000 services time interval 10 min
reaming devices time interval 5 min
In 2000 devices
Esxi 400 and 4000 services time interval 10 min
reaming devices time interval 5 min
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Re: NagiosXI performance issue
If the perfdata spool has become too large, npcd will timeout on directory stat(). What is the output of:
Code: Select all
ls /usr/local/nagios/var/spool/checkresults | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/xidpe | wc -lFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: NagiosXI performance issue
Hi team,
Please find the below output for the following results
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata | wc -l
ls /var/nagiosramdisk/spool/checkresults | wc -l
And also please verify our configuration files which are posted in above thread which explains about my main config file and mod gearman files. All hosts and services are configured with 5 minute interval..
let us know if any thing needs to change.
Thanks in advance,..
Please find the below output for the following results
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata | wc -l
ls /var/nagiosramdisk/spool/checkresults | wc -l
And also please verify our configuration files which are posted in above thread which explains about my main config file and mod gearman files. All hosts and services are configured with 5 minute interval..
let us know if any thing needs to change.
Thanks in advance,..
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NagiosXI performance issue
Is that worker configuration from a remote worker on a different host? In either case I believe you will want to set "services=yes" "hosts=yes".
What are the permissions on:
And what is the output of:
What are the permissions on:
Code: Select all
/var/nagiosramdisk/spool/checkresultsCode: Select all
cat /usr/local/nagios/etc/nagios.cfg | grep check_result