Page 5 of 6
Re: NagiosXI performance issue
Posted: Fri Oct 18, 2013 2:45 pm
by sreinhardt
What would you like us to verify, specifically. Asking us to verify graphs does not really give us a good idea of what kind of information you are looking for or looking to validate.
Re: NagiosXI performance issue
Posted: Fri Oct 18, 2013 6:46 pm
by anoop
Hi Team,
As you know we are facing the issue, that apply configuration is taking hours and hours. So this are the graphs we have taken from VMWare infra team for cpu, memory, and disk utilization, can you please check NagiOS XI machine graphs and verify the IOPS and latency, due to this much utilization we are facing this issue or its ok?
because we are using shared storage on VMWare for NagiOS XI server and offloaded DB, what is our concern we want to verify this issue is coming from Hardware side or application side?
can you please check the graphs again.
Re: NagiosXI performance issue
Posted: Mon Oct 21, 2013 11:20 am
by sreinhardt
These graphs are over an extended period of time. It does not really give us a good indication of anything specifically other than the load is increasing over that period of time, presumable due to added checks, as would be expected. If you have done nothing in the way of performance enhancements, and this virtual host, san, and offloaded mysql db are using the same resources, especially as other systems. I suppose this would be somewhat expected. However as stated previously or in your other thread, your php.ini specifies a max timeout of aroudn 20 minutes, so I do not know of a way that applying configuration could take hours. As previously requested, what kind of latencey are you seeing to your db, and flat files where the configs and performance data are stored?
Re: NagiosXI performance issue
Posted: Mon Oct 21, 2013 11:56 am
by anoop
HI Team,
Tomorrow we are migrating our entire Nagios Environment to New Storage SAN box which is dedicated to Nagios XI and offloaded DB with 1000 IOPS.. we are guessing the issues are due to the limited number of IOPS and thus it causes high CPU Utilization.
We will update you once we migrate to new SAN Box and let u know the performance..
Thanks for your posts..
Re: NagiosXI performance issue
Posted: Mon Oct 21, 2013 12:46 pm
by abrist
Great. I am highly interested in your findings and results.
Re: NagiosXI performance issue
Posted: Tue Oct 29, 2013 8:51 am
by anoop
HI Team,
We got new storage and now iowait is showing less than 2% and no issues with iowait, but load is showing more in Server. So, we configured 2 mod gearman workers, one is in local Nagios Server maintaining Network Devices and one is in remote worker which is maintaining VMware Devices. Load is reduced and showing 10 for 1 minute.
But in monitoring engine, the average service check latency is showing above 100 and going beyond some times. Due to this, graph is going spikes and after restart the nagios, it is working fine. Please let me know the permanent solution for that.
In process_perfdata.cfg file, i increased timeout value to "60", but in logs, it is showing in npcd.log file like,
[10-29-2013 11:33:12] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026468.perfdata.service'
[10-29-2013 11:36:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:36:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026648.perfdata.service'
[10-29-2013 11:37:57] NPCD: ERROR: Executed command exits with return code '7'
[10-29-2013 11:37:57] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1383026678.perfdata.service'
and verified ram disk configurations, everything showing fine as per the document, but still the problem persists.
In process_perfdata.log,
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Timeout after 60 secs. ***
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-10-29 11:36:57 [3196] [0] *** TIMEOUT: Please check your npcd.cfg
Please let me know regarding the above errors and also im attaching my "nagios.cfg" and mod_gearman neb and worker file for your reference.. Let me know if any changes need to be done from my end in config files..
Thanks in Advance
Re: NagiosXI performance issue
Posted: Tue Oct 29, 2013 12:34 pm
by anoop
we have 2000 devices and 14000 services
In 2000 devices
Esxi 400 and 4000 services time interval 10 min
reaming devices time interval 5 min
Re: NagiosXI performance issue
Posted: Tue Oct 29, 2013 3:36 pm
by abrist
If the perfdata spool has become too large, npcd will timeout on directory stat(). What is the output of:
Code: Select all
ls /usr/local/nagios/var/spool/checkresults | wc -l
ls /usr/local/nagios/var/spool/perfdata | wc -l
ls /usr/local/nagios/var/spool/xidpe | wc -l
Re: NagiosXI performance issue
Posted: Wed Oct 30, 2013 11:10 am
by anoop
Hi team,
Please find the below output for the following results
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata | wc -l
ls /var/nagiosramdisk/spool/checkresults | wc -l
And also please verify our configuration files which are posted in above thread which explains about my main config file and mod gearman files. All hosts and services are configured with 5 minute interval..
let us know if any thing needs to change.
Thanks in advance,..
Re: NagiosXI performance issue
Posted: Wed Oct 30, 2013 4:50 pm
by slansing
Is that worker configuration from a remote worker on a different host? In either case I believe you will want to set "services=yes" "hosts=yes".
What are the permissions on:
Code: Select all
/var/nagiosramdisk/spool/checkresults
And what is the output of:
Code: Select all
cat /usr/local/nagios/etc/nagios.cfg | grep check_result