NagiosXI performance issue

sreinhardt · Post by **sreinhardt** » Fri Oct 18, 2013 2:45 pm

What would you like us to verify, specifically. Asking us to verify graphs does not really give us a good idea of what kind of information you are looking for or looking to validate.

anoop · Post by **anoop** » Fri Oct 18, 2013 6:46 pm

Hi Team,

As you know we are facing the issue, that apply configuration is taking hours and hours. So this are the graphs we have taken from VMWare infra team for cpu, memory, and disk utilization, can you please check NagiOS XI machine graphs and verify the IOPS and latency, due to this much utilization we are facing this issue or its ok?

because we are using shared storage on VMWare for NagiOS XI server and offloaded DB, what is our concern we want to verify this issue is coming from Hardware side or application side?

can you please check the graphs again.

sreinhardt · Post by **sreinhardt** » Mon Oct 21, 2013 11:20 am

These graphs are over an extended period of time. It does not really give us a good indication of anything specifically other than the load is increasing over that period of time, presumable due to added checks, as would be expected. If you have done nothing in the way of performance enhancements, and this virtual host, san, and offloaded mysql db are using the same resources, especially as other systems. I suppose this would be somewhat expected. However as stated previously or in your other thread, your php.ini specifies a max timeout of aroudn 20 minutes, so I do not know of a way that applying configuration could take hours. As previously requested, what kind of latencey are you seeing to your db, and flat files where the configs and performance data are stored?

anoop · Post by **anoop** » Mon Oct 21, 2013 11:56 am

HI Team,

Tomorrow we are migrating our entire Nagios Environment to New Storage SAN box which is dedicated to Nagios XI and offloaded DB with 1000 IOPS.. we are guessing the issues are due to the limited number of IOPS and thus it causes high CPU Utilization.

We will update you once we migrate to new SAN Box and let u know the performance..

Thanks for your posts..

abrist · Post by **abrist** » Mon Oct 21, 2013 12:46 pm

Great. I am highly interested in your findings and results.

anoop · Post by **anoop** » Tue Oct 29, 2013 8:51 am

HI Team,

We got new storage and now iowait is showing less than 2% and no issues with iowait, but load is showing more in Server. So, we configured 2 mod gearman workers, one is in local Nagios Server maintaining Network Devices and one is in remote worker which is maintaining VMware Devices. Load is reduced and showing 10 for 1 minute.

But in monitoring engine, the average service check latency is showing above 100 and going beyond some times. Due to this, graph is going spikes and after restart the nagios, it is working fine. Please let me know the permanent solution for that.

In process_perfdata.cfg file, i increased timeout value to "60", but in logs, it is showing in npcd.log file like,

[10-29-2013 11:33:12] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026468.perfdata.service'
[10-29-2013 11:36:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:36:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026648.perfdata.service'
[10-29-2013 11:37:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:37:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026678.perfdata.service'

and verified ram disk configurations, everything showing fine as per the document, but still the problem persists.

In process_perfdata.log,

2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Timeout after 60 secs. ***
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Please check your npcd.cfg

Please let me know regarding the above errors and also im attaching my "nagios.cfg" and mod_gearman neb and worker file for your reference.. Let me know if any changes need to be done from my end in config files..

Thanks in Advance

anoop · Post by **anoop** » Tue Oct 29, 2013 12:34 pm

we have 2000 devices and 14000 services

In 2000 devices

Esxi 400 and 4000 services time interval 10 min
reaming devices time interval 5 min

abrist · Post by **abrist** » Tue Oct 29, 2013 3:36 pm

If the perfdata spool has become too large, npcd will timeout on directory stat(). What is the output of:

Code: Select all

ls /usr/local/nagios/var/spool/checkresults | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/xidpe | wc -l

anoop · Post by **anoop** » Wed Oct 30, 2013 11:10 am

Hi team,

Please find the below output for the following results

ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata | wc -l
ls /var/nagiosramdisk/spool/checkresults | wc -l

And also please verify our configuration files which are posted in above thread which explains about my main config file and mod gearman files. All hosts and services are configured with 5 minute interval..

let us know if any thing needs to change.

Thanks in advance,..

slansing · Post by **slansing** » Wed Oct 30, 2013 4:50 pm

Is that worker configuration from a remote worker on a different host? In either case I believe you will want to set "services=yes" "hosts=yes".

What are the permissions on:

Code: Select all

/var/nagiosramdisk/spool/checkresults

And what is the output of:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep check_result

Nagios Support Forum

NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue