Page 1 of 3
RAMDISK full
Posted: Sun Dec 06, 2015 9:18 pm
by Fred Kroeger
I've had my RAmdisk fill up 3 times in a 9 day window now, so it looks like I've got some strange problem.
The ramdisk is configured for 100MB
df shows it as full
Code: Select all
tmpfs 100M 100M 0 100% /var/nagiosramdisk
when I check the ramdisk , only a few files where in the file system.
lsof revealed that I had a couple of deleted files that were still open. The two files add up to the 100MB - not sure why they are so large. Normally the ramdisk sits at around 11MB used with 89MB free
Code: Select all
# lsof | grep deleted | grep nagiosramdisk
nagios 17129 nagios 14w REG 0,17 13864348 261619983 /var/nagiosramdisk/spool/perfdata/1449446259.perfdata.host-PID-18981 (deleted)
nagios 17129 nagios 15w REG 0,17 86723351 261625465 /var/nagiosramdisk/spool/perfdata/1449446259.perfdata.service-PID-18982 (deleted)
ps -ef | grep 17129
nagios 17129 17117 0 09:57 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Restarting the Nagios service cleared those deleted files
The perfdata log file showed the following errors
Code: Select all
tail /usr/local/nagios/var/perfdata.log
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1449446259.perfdata.host-PID-18981 deleted
2015-12-07 09:57:50 [18981] [0] *** process_perfdata.pl terminated on signal ALRM
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1449446259.perfdata.service-PID-18982 deleted
2015-12-07 09:57:50 [18982] [0] *** process_perfdata.pl terminated on signal ALRM
The process_perfdata.cfg file looks like the standard that is used at all my other Nagios instances. Also everything works perfectly normally except for 3 occasions in the last 9 days.
Load average is usually < 1 so there is no problem with perfdata files being moved out.
Is there anything else I need to check to resolve this problem ?
thanks... Fred
Re: RAMDISK full
Posted: Mon Dec 07, 2015 10:55 am
by lmiltchev
Enable the debugging in the "/usr/local/nagios/etc/pnp/npcd.cfg" and "/usr/local/nagios/etc/pnp/process_perfdata.cfg" file (if you haven't done it already), and restart npcd:
Next, run the following commands and show the output in code wraps:
Code: Select all
/usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
df -h
tail -100 /usr/local/nagios/var/perfdata.log
tail -100 /usr/local/nagios/var/npcd.log
uptime
ps -ef | grep perf
Re: RAMDISK full
Posted: Tue Dec 08, 2015 6:57 pm
by Fred Kroeger
Requested details follow
Code: Select all
# service npcd restart
NPCD Stopped.
DEBUG: Config File = /usr/local/nagios/etc/pnp/npcd.cfg
CONFIG_OPT_LOGTYPE = file
CONFIG_OPT_LOGFILE = /usr/local/nagios/var/npcd.log
CONFIG_OPT_LOGFILESIZE = 10485760
CONFIG_OPT_LOGLEVEL = -1
CONFIG_OPT_SCANDIR = /var/nagiosramdisk/spool/perfdata/
CONFIG_OPT_RUNCMD = /usr/local/nagios/libexec/process_perfdata.pl
CONFIG_OPT_RUNCMD_ARG = -b
CONFIG_OPT_MAXTHREADS = 5
CONFIG_OPT_LOAD = 20.0
CONFIG_OPT_USER = nagios
CONFIG_OPT_GROUP = nagios
CONFIG_OPT_PIDFILE = /usr/local/nagiosxi/var/subsys/npcd.pid
CONFIG_OPT_SLEEPTIME = 15
CONFIG_OPT_IDENTMYSELF = (null)
---------------------------
DEBUG: load_threshold is enabled - ('20.000000')
NPCD started.
Code: Select all
# /usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
742
3761
Code: Select all
# /usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
742
3761
Code: Select all
# tail -100 /usr/local/nagios/var/perfdata.log
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X1/Memory_Used_-_Wintel.rrd 1449617851:15:0:6
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X1/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 194
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X2/ Memory_Used_-_Wintel ('physical memory %'=20%;90;91 'physical memory'=837.262M;3685.996;3726.951;0;4095.551 'virtual memory %'=0%;90;91 'virtual memory'=386.754M;7549747.087;7633633.166;0;8388607.875 'paged bytes %'=16%;90;91 'paged bytes'=808.258M;4319.596;4367.591;0;4799.551)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X3/Memory_Used_-_Wintel.rrd 1449617851:20:837.262:0:386.754:16:808.258
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X3/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 195
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X4/ Disk_Usage_-_root_-_UNIX (/=5460MB;9340;9635;0;9832)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_disk (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_disk.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_disk.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X4/Disk_Usage_-_root_-_UNIX.rrd 1449617851:5460
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X4/Disk_Usage_-_root_-_UNIX.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 196
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X5/ Memory_Used_-_Wintel ('physical memory'=20%;90;91; 'virtual memory'=0%;90;91; 'paged bytes'=13%;90;91;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X5/Memory_Used_-_Wintel.rrd 1449617851:20:0:13
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X5/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 197
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 197
2015-12-09 09:37:40 [2590] [2] Processing Line 198
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 198
2015-12-09 09:37:40 [2590] [2] Processing Line 199
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 199
2015-12-09 09:37:40 [2590] [2] Processing Line 200
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X6/ Memory_Used_-_Wintel ('physical memory'=61%;90;91; 'virtual memory'=0%;90;91; 'paged bytes'=34%;90;91;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X6/Memory_Used_-_Wintel.rrd 1449617851:61:0:34
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X6/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 201
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X7/ Load_-_UNIX_-_Dev_Test (load1=0.010;35.000;50.000;0; load5=0.020;40.000;55.000;0; load15=0.000;45.000;60.000;0;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X7/Load_-_UNIX_-_Dev_Test.rrd 1449617851:0.010:0.020:0.000
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X7/Load_-_UNIX_-_Dev_Test.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 202
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X8/ check_snmp_module_1_status (Module=2)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_snmp (/usr/local/nagios/etc/pnp/check_commands/check_snmp.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_snmp.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X8/check_snmp_module_1_status.rrd 1449617851:2
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X8/check_snmp_module_1_status.rrd updated
2015-12-09 09:37:40 [2590] [1] 202 Lines processed
2015-12-09 09:37:40 [2590] [1] /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.service-PID-2590 deleted
2015-12-09 09:37:40 [2590] [1] PNP exiting (runtime 0.583946s) ...
Code: Select all
# tail -100 /usr/local/nagios/var/npcd.log
[12-09-2015 09:37:23] NPCD: ThreadCounter 0/5 File is 1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: Regular File: 1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:23] NPCD: Processing file 1449617835.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:23] NPCD: Processing file '1449617835.perfdata.host'
[12-09-2015 09:37:23] NPCD: ThreadCounter 1/5 File is 1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: Regular File: 1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:23] NPCD: Processing file 1449617835.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:23] NPCD: Processing file '1449617835.perfdata.service'
[12-09-2015 09:37:25] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:37:40] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is 1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: Regular File: 1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:40] NPCD: Processing file 1449617851.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: Processing file '1449617851.perfdata.host'
[12-09-2015 09:37:40] NPCD: ThreadCounter 1/5 File is 1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: Regular File: 1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:40] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:40] NPCD: Processing file 1449617851.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: Processing file '1449617851.perfdata.service'
[12-09-2015 09:37:42] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:37:57] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is 1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: Regular File: 1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: Processing file 1449617865.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: ThreadCounter 1/5 File is 1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: Processing file '1449617865.perfdata.host'
[12-09-2015 09:37:57] NPCD: Regular File: 1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:57] NPCD: Processing file 1449617865.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:57] NPCD: Processing file '1449617865.perfdata.service'
[12-09-2015 09:38:00] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:38:15] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is 1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: Regular File: 1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: Processing file 1449617880.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: ThreadCounter 1/5 File is 1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: Processing file '1449617880.perfdata.host'
[12-09-2015 09:38:15] NPCD: Regular File: 1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:38:15] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:38:15] NPCD: Processing file 1449617881.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: Processing file '1449617881.perfdata.service'
[12-09-2015 09:38:17] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:38:32] NPCD: Found 6 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is 1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: Regular File: 1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: Processing file 1449617895.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: ThreadCounter 1/5 File is 1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617895.perfdata.host'
[12-09-2015 09:38:32] NPCD: Regular File: 1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 2/5 File is 1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: Regular File: 1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 2
[12-09-2015 09:38:32] NPCD: Processing file 1449617911.perfdata.host with ID 140374899160832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: Processing file 1449617895.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: Processing file '1449617895.perfdata.service'
[12-09-2015 09:38:32] NPCD: ThreadCounter 3/5 File is 1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617911.perfdata.host'
[12-09-2015 09:38:32] NPCD: Regular File: 1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 3
[12-09-2015 09:38:32] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[12-09-2015 09:38:32] NPCD: Processing file 1449617911.perfdata.service with ID 140374888670976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617911.perfdata.service'
Code: Select all
# uptime
09:39:07 up 27 days, 22:13, 1 user, load average: 0.31, 0.21, 0.18
Code: Select all
# ps -ef | grep perf
nagios 16625 16616 0 09:39 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 16630 16625 0 09:39 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
root 19447 28876 0 09:39 pts/0 00:00:00 grep perf
BTW - I don't think I've ever seen the host & service perfdata files get as large as the ones that couln't be deleted
This is a snapshot of their normal sizes before they get processed.
Code: Select all
# ls -l /var/nagiosramdisk/spool/perfdata
total 96
-rw-r--r-- 1 nagios nagios 10361 Dec 9 09:50 1449618645.perfdata.host
-rw-r--r-- 1 nagios nagios 82775 Dec 9 09:50 1449618645.perfdata.service
Re: RAMDISK full
Posted: Wed Dec 09, 2015 2:09 pm
by rkennedy
Is increasing your ramdisk an option? After looking at our ram disk document I believe yours could be increased to 500M as recommended.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Can you also post the result of the following command? -
Re: RAMDISK full
Posted: Tue Dec 15, 2015 6:58 pm
by Fred Kroeger
Ahhh.... you've updated the RAMDisk doco! I have a standard Nagios installation based on the old doco that recommended 50M which I didn't think was adequate, so I have always configured them for 100M
I'll review the doco and change the sizes accordingly.
Code: Select all
top - 09:54:02 up 34 days, 22:28, 1 user, load average: 0.00, 0.03, 0.06
Tasks: 285 total, 4 running, 280 sleeping, 0 stopped, 1 zombie
Cpu(s): 10.9%us, 4.7%sy, 0.0%ni, 83.4%id, 0.6%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 3923728k total, 2644752k used, 1278976k free, 140748k buffers
Swap: 2064380k total, 189196k used, 1875184k free, 1618492k cached
BTW - the load average is pretty low because I use a Mod-Gearman worker
Regards Fred
Re: RAMDISK full
Posted: Wed Dec 16, 2015 10:33 am
by rkennedy
That is a pretty low load!
Let us know if upgrading your ramdisk works to resolve this!
Re: RAMDISK full
Posted: Wed Dec 16, 2015 7:54 pm
by Fred Kroeger
Yes load is impressive! It's amazing what difference adding a Mod-Gearman worker can make.
I have incresaed the RAMDisk to 500MB - so will wait & see if the problem re-occurs.
Code: Select all
top - 10:52:11 up 15:38, 1 user, load average: 0.37, 0.67, 0.84
Tasks: 170 total, 1 running, 168 sleeping, 0 stopped, 1 zombie
Cpu(s): 5.0%us, 2.4%sy, 0.0%ni, 91.7%id, 0.7%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 3923728k total, 3073248k used, 850480k free, 155772k buffers
Swap: 2064380k total, 6820k used, 2057560k free, 2096352k cached
Code: Select all
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
27G 8.7G 17G 34% /
tmpfs 1.9G 1.0M 1.9G 1% /dev/shm
/dev/mapper/VolGroup-lv_app
50G 6.3G 41G 14% /usr/local
/dev/sda1 477M 66M 386M 15% /boot
tmpfs 500M 11M 490M 3% /var/nagiosramdisk
Re: RAMDISK full
Posted: Thu Dec 17, 2015 10:59 am
by hsmith
Let us know if it comes back. Thanks!
Re: RAMDISK full
Posted: Fri Jan 08, 2016 1:26 am
by Fred Kroeger
Happened again on a different server this time. As this is my main server , I had previously increased the size to 500MB after the previous discussions.
Same issue as before, where there were two open perfdata files (host & service) that were showing as deleted (using lsof command). It was the Nagios process that was holding them open still.
Quick fix is to stop/start the nagios service which closes the file and completes the deletion. However this is definitely something that has been introduced since NagiosXI 5.2 as I have not experienced this problem with the older versions and now on different servers.
regards... Fred
Re: RAMDISK full
Posted: Fri Jan 08, 2016 11:28 am
by rkennedy
Odd - I'd like to gather a bit more information.
Just to clarify, this occurred on a machine that wasn't increased to a 500M ramdisk right?
What is the result of top|head -5 on this other machine?
How many hosts / service checks are running on it at this point on it?