NagiosXI performance issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: NagiosXI performance issue

Post by sreinhardt »

What would you like us to verify, specifically. Asking us to verify graphs does not really give us a good idea of what kind of information you are looking for or looking to validate.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
anoop
Posts: 95
Joined: Tue Jun 25, 2013 1:22 am

Re: NagiosXI performance issue

Post by anoop »

Hi Team,

As you know we are facing the issue, that apply configuration is taking hours and hours. So this are the graphs we have taken from VMWare infra team for cpu, memory, and disk utilization, can you please check NagiOS XI machine graphs and verify the IOPS and latency, due to this much utilization we are facing this issue or its ok?

because we are using shared storage on VMWare for NagiOS XI server and offloaded DB, what is our concern we want to verify this issue is coming from Hardware side or application side?

can you please check the graphs again.
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: NagiosXI performance issue

Post by sreinhardt »

These graphs are over an extended period of time. It does not really give us a good indication of anything specifically other than the load is increasing over that period of time, presumable due to added checks, as would be expected. If you have done nothing in the way of performance enhancements, and this virtual host, san, and offloaded mysql db are using the same resources, especially as other systems. I suppose this would be somewhat expected. However as stated previously or in your other thread, your php.ini specifies a max timeout of aroudn 20 minutes, so I do not know of a way that applying configuration could take hours. As previously requested, what kind of latencey are you seeing to your db, and flat files where the configs and performance data are stored?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
anoop
Posts: 95
Joined: Tue Jun 25, 2013 1:22 am

Re: NagiosXI performance issue

Post by anoop »

HI Team,

Tomorrow we are migrating our entire Nagios Environment to New Storage SAN box which is dedicated to Nagios XI and offloaded DB with 1000 IOPS.. we are guessing the issues are due to the limited number of IOPS and thus it causes high CPU Utilization.

We will update you once we migrate to new SAN Box and let u know the performance..

Thanks for your posts..
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NagiosXI performance issue

Post by abrist »

Great. I am highly interested in your findings and results.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
anoop
Posts: 95
Joined: Tue Jun 25, 2013 1:22 am

Re: NagiosXI performance issue

Post by anoop »

HI Team,

We got new storage and now iowait is showing less than 2% and no issues with iowait, but load is showing more in Server. So, we configured 2 mod gearman workers, one is in local Nagios Server maintaining Network Devices and one is in remote worker which is maintaining VMware Devices. Load is reduced and showing 10 for 1 minute.

But in monitoring engine, the average service check latency is showing above 100 and going beyond some times. Due to this, graph is going spikes and after restart the nagios, it is working fine. Please let me know the permanent solution for that.

In process_perfdata.cfg file, i increased timeout value to "60", but in logs, it is showing in npcd.log file like,

[10-29-2013 11:33:12] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026468.perfdata.service'
[10-29-2013 11:36:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:36:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026648.perfdata.service'
[10-29-2013 11:37:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:37:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026678.perfdata.service'


and verified ram disk configurations, everything showing fine as per the document, but still the problem persists.

In process_perfdata.log,

2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Timeout after 60 secs. ***
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Please check your npcd.cfg

Please let me know regarding the above errors and also im attaching my "nagios.cfg" and mod_gearman neb and worker file for your reference.. Let me know if any changes need to be done from my end in config files..

Thanks in Advance
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
anoop
Posts: 95
Joined: Tue Jun 25, 2013 1:22 am

Re: NagiosXI performance issue

Post by anoop »

we have 2000 devices and 14000 services

In 2000 devices

Esxi 400 and 4000 services time interval 10 min
reaming devices time interval 5 min
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NagiosXI performance issue

Post by abrist »

If the perfdata spool has become too large, npcd will timeout on directory stat(). What is the output of:

Code: Select all

ls /usr/local/nagios/var/spool/checkresults | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/xidpe | wc -l
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
anoop
Posts: 95
Joined: Tue Jun 25, 2013 1:22 am

Re: NagiosXI performance issue

Post by anoop »

Hi team,

Please find the below output for the following results

ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata | wc -l
ls /var/nagiosramdisk/spool/checkresults | wc -l

And also please verify our configuration files which are posted in above thread which explains about my main config file and mod gearman files. All hosts and services are configured with 5 minute interval..

let us know if any thing needs to change.

Thanks in advance,..
You do not have the required permissions to view the files attached to this post.
System:
Nagios XI Version : 2012R2.2 | PHP Version: 5.3.3
Offloaded MySQL DB on another virtual machine
16 CPU with 2 cores each | 32 GB RAM | 1 TB HDD
CentOS-6.3 |Total = 4,000 hosts| 40,000 services.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NagiosXI performance issue

Post by slansing »

Is that worker configuration from a remote worker on a different host? In either case I believe you will want to set "services=yes" "hosts=yes".

What are the permissions on:

Code: Select all

/var/nagiosramdisk/spool/checkresults
And what is the output of:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep check_result
Locked