Page 1 of 6

Problems with Host Performance Graphs and Bandwidth

Posted: Fri Jan 18, 2013 5:24 am
by David.adder
Hi,

We have been always with problems in the pictures of Host Performance Graphs. Sometimes, there are some blanks. This happen in the host but not in the ping service graphs.

Also we have monitored many routers, firewalls, and in the Bandwidth Graphs there are many information missing. It doesn't draw the pictures corretly. Many blanks here too, the graphs is never continuos, maybe 5 minutes, but then you have 15-20 minutes with no graphs.
Captura.JPG
I've been investigating to improve the performance in our NagiosXI server with some manuals of Nagios XI as "Using_rrdcached_with_Nagios_XI" "Maximizing_XI_Performance" but we get still the same.

Has anybody get any similar problems, and have knowledge of how to fixed this?

Thank you!

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Fri Jan 18, 2013 10:44 am
by abrist
What version of XI are you running?

Could you post a tail of the following logs?

Code: Select all

tail /usr/local/nagios/var/perfdata.log
tail /usr/local/nagios/var/npcd.log

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Sat Jan 19, 2013 9:19 am
by David.adder
The version is 2011R3.3

The tail of that commands is this:
Captura1.JPG
Captura.JPG

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Mon Jan 21, 2013 11:53 am
by abrist
It looks like you are hitting the max load and timeout settings for performance data processing:

In the file: /usr/local/nagios/etc/pnp/process_perfdata.pl
Change:

Code: Select all

TIMEOUT = 5
to:

Code: Select all

TIMEOUT = 10
In the file: /usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

load_threshold = 10.0
to:

Code: Select all

load_threshold = 30.0
Restart npcd:

Code: Select all

service npcd restart
Wait 15 minutes after the changes and then recheck the logs and verify if perdata is recorded as expected.

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Mon Jan 21, 2013 1:54 pm
by David.adder
I did that changes and restarted npcd:

npcd_max_threads = 5

# sleep_time - how many seconds should npcd wait between dirscans
#
# sleep_time = 15 (default)

sleep_time = 15


# EXPERIMENTAL
#
# use_load_threshold - enables/disables load watching
#
# use_load_threshold = <0 / 1> (default: 0)
#

#use_load_threshold = 0


# EXPERIMENTAL
#
# load_threshold - npcd won't start new threads
# if your system load is over this threshold
#
# load_threshold = <float value> (default: 10.0)
#
# Hint: Do not use "," as decimal delimeter
#

load_threshold = 30.0

#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 10
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /usr/bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60

But I still get this:

tail /usr/local/nagios/var/perfdata.log
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5341] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701-PID-5341 deleted
2013-01-21 19:42:19 [5341] [0] *** Timeout while processing Host: "NY-ARIES" Service: "my_mem_check"
2013-01-21 19:42:19 [5341] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-21 19:42:19 [5344] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715-PID-5344 deleted
2013-01-21 19:42:19 [5344] [0] *** Timeout while processing Host: "NYC-RLDCEX" Service: "my_mem_check"
2013-01-21 19:42:19 [5344] [0] *** process_perfdata.pl terminated on signal ALRM

tail /usr/local/nagios/var/npcd.log
[01-21-2013 19:40:18] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:18] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//host-perfdata.1358793595'
[01-21-2013 19:40:46] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:40:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793625'
[01-21-2013 19:41:52] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:41:52] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793687'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793701'
[01-21-2013 19:42:19] NPCD: ERROR: Executed command exits with return code '7'
[01-21-2013 19:42:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358793715'

I've just upgraded Nagios to 2012R1.4 and since I've done that, now I don't get any graph regarding bandwidth...

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Mon Jan 21, 2013 5:21 pm
by scottwilkerson
Can you output the results of

Code: Select all

ll /usr/local/nagios/var/spool/perfdata|wc -l

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Tue Jan 22, 2013 2:57 am
by David.adder
ll /usr/local/nagios/var/spool/perfdata|wc -l
3

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Tue Jan 22, 2013 8:25 am
by scottwilkerson
Are you still seeing new errors in the log?

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Tue Jan 22, 2013 10:02 am
by David.adder
Yes, still the same:

tail /usr/local/nagios/var/perfdata.log
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:40:59 [28824] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633-PID-28824 deleted
2013-01-22 15:40:59 [28824] [0] *** Timeout while processing Host: "SRVADDASDC01001" Service: "VMware_Storage_Array_Datastore_DSPISDC01006PRODSAS72_Usage"
2013-01-22 15:40:59 [28824] [0] *** process_perfdata.pl terminated on signal ALRM
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: Please check your npcd.cfg
2013-01-22 15:55:35 [466] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503-PID-466 deleted
2013-01-22 15:55:35 [466] [0] *** Timeout while processing Host: "www.ryanlabs.com" Service: "DNS_IP_Match"
2013-01-22 15:55:35 [466] [0] *** process_perfdata.pl terminated on signal ALRM

tail /usr/local/nagios/var/npcd.log
[01-22-2013 15:30:22] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:30:22] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865003'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865378'
[01-22-2013 15:36:37] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:36:37] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865363'
[01-22-2013 15:40:59] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:40:59] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358865633'
[01-22-2013 15:55:35] NPCD: ERROR: Executed command exits with return code '7'
[01-22-2013 15:55:35] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//service-perfdata.1358866503'

Re: Problems with Host Performance Graphs and Bandwidth

Posted: Tue Jan 22, 2013 10:23 am
by mguthrie
Just to make sure we rule it out, can you also run these commands and post the output:

Code: Select all

ll /usr/local/nagios/var/spool/xdpe|wc -l

Code: Select all

ll /usr/local/nagios/var