Page 1 of 1

Perfdata Filling up Disk

Posted: Tue Dec 04, 2018 4:23 am
by domsch1988
Hello everyone,

haven't been posting for some time, and now i come back here with a problem we are facing since around a week. Our NagiosXI Servers Disk is filling up with performance data (in like, 2G per Minute or more). First, what i have for you:

/usr/local/nagios/var/npcd.log is filled with

Code: Select all

[12-04-2018 10:12:19] NPCD: ERROR: Executed command exits with return code '7'
[12-04-2018 10:12:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1543914244.perfdata.host'
/usr/local/nagios/var/perfdata.log has tons of

Code: Select all

2018-12-04 10:12:19 [63357] [0] *** TIMEOUT: Please check your npcd.cfg
2018-12-04 10:12:19 [63357] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1543914275.perfdata.host-PID-63357 deleted
2018-12-04 10:12:19 [63357] [0] *** Timeout while processing Host: "al-infoweb" Service: "_HOST_"
2018-12-04 10:12:19 [63357] [0] *** process_perfdata.pl terminated on signal ALRM
2018-12-04 10:12:19 [63355] [0] *** TIMEOUT: Timeout after 120 secs. ***
2018-12-04 10:12:19 [63355] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2018-12-04 10:12:19 [63355] [0] *** TIMEOUT: Please check your npcd.cfg
2018-12-04 10:12:19 [63355] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1543914289.perfdata.host-PID-63355 deleted
2018-12-04 10:12:19 [63355] [0] *** Timeout while processing Host: "DimagTS" Service: "_HOST_"
2018-12-04 10:12:19 [63355] [0] *** process_perfdata.pl terminated on signal ALRM
Timeout in process_perfdata.cfg was initially "5", for testing i set that to "120", to see if it does anything. No dice. To use our perfdata with Grafana, i had pnp4nagios set up. I meanwhile undid the changes to the "process perfdata" commands. I had added

Code: Select all

&& cp /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host /usr/local/pnp4nagios/var/spool/graphios
to the end of them.

I'm at a total loss. I checked all permissions. Everything is run as user nagios. All permissions should be good. If you need anything else, feel free to ask. Any help here would be highly appreciated!

For completeness, i'm including the relevant configs:

npcd.cfg

Code: Select all

# NPCD.cfg - sample configuration file for PNPs NPCD
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as 
# published by the Free Software Foundation;
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
#
# $Id: npcd.cfg-sample.in 440 2008-04-24 09:08:20Z hendrikb $

# Privilege Options

user = nagios
group = nagios


# Logging Options

# log_type - define if you want your logs to 
# 'syslog' or to a 'file'
#
# log_type = <value>
#

log_type = file
#log_type = syslog


# log_file - define a path to your logfile
# needed if 'log_type'='file'
#
# log_file = </path/to/logpath/filename>
#

log_file = /usr/local/nagios/var/npcd.log


# max_logfile_size - defines the maximum filesize (bytes)
# before the logfile will rotated.
#
# max_logfile_size = <value> (default 10Mbyte)
#

max_logfile_size = 10485760


# log_level - how much should we log?
#
# log_level = <integer value>
#
#  0 = No logging - except errors
#  1 = Small logging - some few more output
#  2 = More Logging (actual ALL logs)
# -1 = DEBUG Mode - ALL Logging and slower processing
#

log_level = 0


# NEEDED OPTIONS
# 
# perfdata_spool_dir - where we can find the 
# performance data files
#
# perfdata_spool_dir = </path/to/directory/>
#

perfdata_spool_dir = /usr/local/nagios/var/spool/perfdata/


# Execute following command for each found file
# in 'perfdata_spool_dir'
#
# perfdata_file_run_cmd = </path/to/command>
# 
# Must be executable by user/group from above
#
# perfdata_file_run_cmd = </path/to/filename>
#

perfdata_file_run_cmd = /usr/local/nagios/libexec/process_perfdata.pl


# perfdata_file_run_cmd_args (optional) 
#
# If you wish, you can apply more arguments to the
# perfdata_file_run_cmd
#
# Hint:
# NPCD will create a command line like this:
# '<perfdata_file_runc_cmd> <perfdata_file_runc_cmd_args> <filename_from_spool_dir>'
#

perfdata_file_run_cmd_args = -b


# npcd_max_threads - define how many parallel threads we 
# should start

npcd_max_threads = 8

# sleep_time - how many seconds should npcd wait between dirscans
#
# sleep_time = 15 (default)

sleep_time = 15


# EXPERIMENTAL
#
# use_load_threshold - enables/disables load watching
#
# use_load_threshold = <0 / 1> (default: 0)
#

#use_load_threshold = 0


# EXPERIMENTAL
#
# load_threshold - npcd won't start new threads
# if your system load is over this threshold
#
# load_threshold = <float value> (default: 10.0)
#
# Hint: Do not use "," as decimal delimeter
#

load_threshold = 20.0


# location of your pid file

#pid_file=/var/run/npcd.pid
pid_file=/usr/local/nagiosxi/var/subsys/npcd.pid


# We have to end with a newline

process_perfdata.cfg

Code: Select all

#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout 
#
TIMEOUT = 120
#
# Use RRDs Perl Module
#
USE_RRDs = 1 
#
# 
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460 
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
#
#
#
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 0
#
# XML encoding
# The supported encodings are ISO-8859-1, UTF-8 and US-ASCII.
# http://www.php.net/xml-parser-create
XML_ENC = UTF-8
#
# EXPERIMENTAL rrdcached Support
# Use only with rrdtool svn revision 1511+
#
# RRD_DAEMON_OPTS = unix:/tmp/rrdcached.sock
nagios.cfg

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=1


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=0
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$\tLONGSERVICEOUTPUT::$LONGSERVICEOUTPUT$\tGRAPHITEPREFIX::$HOSTGROUPALIAS$\tGRAPHITEPOSTFIX::$SERVICEDESC$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$\tLONGHOSTOUTPUT::$LONGHOSTOUTPUT$\tGRAPHITEPREFIX::$_HOSTGRAPHITEPREFIX$\tGRAPHITEPOSTFIX::$_HOSTGRAPHITEPOSTFIX$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=1
check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_reaper_frequency=3
check_service_freshness=1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=10
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/var/run/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=10
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/usr/local/nagios/var/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=90
service_freshness_check_interval=10
service_inter_check_delay_method=s
service_interleave_factor=s
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
use_aggressive_host_checking=0
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
# 
# ###### Auto-generated Graphios configs #######
#process_performance_data=1
#service_perfdata_file=/var/spool/nagios/graphios/service-perfdata
#service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tGRAPHITEPREFIX::$_SERVICEGRAPHITEPREFIX$\tGRAPHITEPOSTFIX::$_SERVICEGRAPHITEPOSTFIX$\tMETRICTYPE::$_SERVICEMETRICTYPE$
#service_perfdata_file_mode=a
#service_perfdata_file_processing_interval=15
#service_perfdata_file_processing_command=graphios_perf_service
#host_perfdata_file=/var/spool/nagios/graphios/host-perfdata
#host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tGRAPHITEPREFIX::$_HOSTGRAPHITEPREFIX$\tGRAPHITEPOSTFIX::$_HOSTGRAPHITEPOSTFIX$\tMETRICTYPE::$_HOSTMETRICTYPE$
#host_perfdata_file_mode=a
#host_perfdata_file_processing_interval=15
#host_perfdata_file_processing_command=graphios_perf_host
#host_perfdata_file_template=\tGRAPHITEPREFIX::$_HOSTGRAPHITEPREFIX$\tGRAPHITEPOSTFIX::$_HOSTGRAPHITEPOSTFIX$

#service_perfdata_file_template=\tGRAPHITEPREFIX::$_SERVICEGRAPHITEPREFIX$\tGRAPHITEPOSTFIX::$_SERVICEGRAPHITEPOSTFIX$


Re: Perfdata Filling up Disk

Posted: Tue Dec 04, 2018 8:28 am
by domsch1988
Ok, i can answer this myself now. First off, i obviously didn't post the configs that mattered in the end.

The simple cause was, that the Command to process host performance Data was using "cp" instead of "mv". Therefor the Host perfdata files kept growing to 1.8G. Every cycle the full performance data history was copied and processed.

Changed that to "mv" and now, filesizes are back down to normal levels and everything is working as intended.

Re: Perfdata Filling up Disk

Posted: Tue Dec 04, 2018 12:15 pm
by scottwilkerson
domsch1988 wrote:Ok, i can answer this myself now. First off, i obviously didn't post the configs that mattered in the end.

The simple cause was, that the Command to process host performance Data was using "cp" instead of "mv". Therefor the Host perfdata files kept growing to 1.8G. Every cycle the full performance data history was copied and processed.

Changed that to "mv" and now, filesizes are back down to normal levels and everything is working as intended.
Glad you were able to solve the issue.

Locking thread