Page 1 of 3

RAMDISK full

Posted: Sun Dec 06, 2015 9:18 pm
by Fred Kroeger
I've had my RAmdisk fill up 3 times in a 9 day window now, so it looks like I've got some strange problem.
The ramdisk is configured for 100MB
df shows it as full

Code: Select all

tmpfs                 100M  100M     0 100% /var/nagiosramdisk

when I check the ramdisk , only a few files where in the file system.

lsof revealed that I had a couple of deleted files that were still open. The two files add up to the 100MB - not sure why they are so large. Normally the ramdisk sits at around 11MB used with 89MB free

Code: Select all

# lsof | grep deleted | grep nagiosramdisk
nagios    17129   nagios   14w      REG               0,17  13864348  261619983 /var/nagiosramdisk/spool/perfdata/1449446259.perfdata.host-PID-18981 (deleted)
nagios    17129   nagios   15w      REG               0,17  86723351  261625465 /var/nagiosramdisk/spool/perfdata/1449446259.perfdata.service-PID-18982 (deleted)

 ps -ef | grep 17129
nagios   17129 17117  0 09:57 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Restarting the Nagios service cleared those deleted files
The perfdata log file showed the following errors

Code: Select all

tail  /usr/local/nagios/var/perfdata.log
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-12-07 09:57:50 [18981] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1449446259.perfdata.host-PID-18981 deleted
2015-12-07 09:57:50 [18981] [0] *** process_perfdata.pl terminated on signal ALRM
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-12-07 09:57:50 [18982] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//1449446259.perfdata.service-PID-18982 deleted
2015-12-07 09:57:50 [18982] [0] *** process_perfdata.pl terminated on signal ALRM
The process_perfdata.cfg file looks like the standard that is used at all my other Nagios instances. Also everything works perfectly normally except for 3 occasions in the last 9 days.
Load average is usually < 1 so there is no problem with perfdata files being moved out.
Is there anything else I need to check to resolve this problem ?

thanks... Fred

Re: RAMDISK full

Posted: Mon Dec 07, 2015 10:55 am
by lmiltchev
Enable the debugging in the "/usr/local/nagios/etc/pnp/npcd.cfg" and "/usr/local/nagios/etc/pnp/process_perfdata.cfg" file (if you haven't done it already), and restart npcd:

Code: Select all

service npcd restart
Next, run the following commands and show the output in code wraps:

Code: Select all

/usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
df -h
tail -100 /usr/local/nagios/var/perfdata.log
tail -100 /usr/local/nagios/var/npcd.log
uptime
ps -ef | grep perf

Re: RAMDISK full

Posted: Tue Dec 08, 2015 6:57 pm
by Fred Kroeger
Requested details follow

Code: Select all

# service npcd restart
NPCD Stopped.
DEBUG: Config File = /usr/local/nagios/etc/pnp/npcd.cfg
CONFIG_OPT_LOGTYPE = file
CONFIG_OPT_LOGFILE = /usr/local/nagios/var/npcd.log
CONFIG_OPT_LOGFILESIZE = 10485760
CONFIG_OPT_LOGLEVEL = -1
CONFIG_OPT_SCANDIR = /var/nagiosramdisk/spool/perfdata/
CONFIG_OPT_RUNCMD = /usr/local/nagios/libexec/process_perfdata.pl
CONFIG_OPT_RUNCMD_ARG = -b
CONFIG_OPT_MAXTHREADS = 5
CONFIG_OPT_LOAD = 20.0
CONFIG_OPT_USER = nagios
CONFIG_OPT_GROUP = nagios
CONFIG_OPT_PIDFILE = /usr/local/nagiosxi/var/subsys/npcd.pid
CONFIG_OPT_SLEEPTIME = 15
CONFIG_OPT_IDENTMYSELF = (null)
---------------------------
DEBUG: load_threshold is enabled - ('20.000000')
NPCD started.

Code: Select all

# /usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
742
3761

Code: Select all

# /usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
742
3761

Code: Select all

# tail -100 /usr/local/nagios/var/perfdata.log
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X1/Memory_Used_-_Wintel.rrd 1449617851:15:0:6
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X1/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 194
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X2/ Memory_Used_-_Wintel ('physical memory %'=20%;90;91 'physical memory'=837.262M;3685.996;3726.951;0;4095.551 'virtual memory %'=0%;90;91 'virtual memory'=386.754M;7549747.087;7633633.166;0;8388607.875 'paged bytes %'=16%;90;91 'paged bytes'=808.258M;4319.596;4367.591;0;4799.551)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X3/Memory_Used_-_Wintel.rrd 1449617851:20:837.262:0:386.754:16:808.258
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X3/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 195
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X4/ Disk_Usage_-_root_-_UNIX (/=5460MB;9340;9635;0;9832)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_disk (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_disk.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_disk.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X4/Disk_Usage_-_root_-_UNIX.rrd 1449617851:5460
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X4/Disk_Usage_-_root_-_UNIX.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 196
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X5/ Memory_Used_-_Wintel ('physical memory'=20%;90;91; 'virtual memory'=0%;90;91; 'paged bytes'=13%;90;91;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X5/Memory_Used_-_Wintel.rrd 1449617851:20:0:13
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X5/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 197
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 197
2015-12-09 09:37:40 [2590] [2] Processing Line 198
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 198
2015-12-09 09:37:40 [2590] [2] Processing Line 199
2015-12-09 09:37:40 [2590] [2] No Perfdata. Skipping line 199
2015-12-09 09:37:40 [2590] [2] Processing Line 200
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X6/ Memory_Used_-_Wintel ('physical memory'=61%;90;91; 'virtual memory'=0%;90;91; 'paged bytes'=34%;90;91;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_nrpe.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X6/Memory_Used_-_Wintel.rrd 1449617851:61:0:34
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X6/Memory_Used_-_Wintel.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 201
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X7/ Load_-_UNIX_-_Dev_Test (load1=0.010;35.000;50.000;0; load5=0.020;40.000;55.000;0; load15=0.000;45.000;60.000;0;)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_by_ssh_load (/usr/local/nagios/etc/pnp/check_commands/check_by_ssh_load.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_by_ssh_load.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X7/Load_-_UNIX_-_Dev_Test.rrd 1449617851:0.010:0.020:0.000
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X7/Load_-_UNIX_-_Dev_Test.rrd updated
2015-12-09 09:37:40 [2590] [2] Processing Line 202
2015-12-09 09:37:40 [2590] [2] Datatype set to 'SERVICEPERFDATA'
2015-12-09 09:37:40 [2590] [1] Found Performance Data for X8/ check_snmp_module_1_status (Module=2)
2015-12-09 09:37:40 [2590] [2] No Custom Template found for check_snmp (/usr/local/nagios/etc/pnp/check_commands/check_snmp.cfg)
2015-12-09 09:37:40 [2590] [2] RRD Datatype is GAUGE
2015-12-09 09:37:40 [2590] [2] Template is check_snmp.php
2015-12-09 09:37:40 [2590] [2] data2rrd called
2015-12-09 09:37:40 [2590] [2] RRDs::update /usr/local/nagios/share/perfdata/X8/check_snmp_module_1_status.rrd 1449617851:2
2015-12-09 09:37:40 [2590] [2] /usr/local/nagios/share/perfdata/X8/check_snmp_module_1_status.rrd updated
2015-12-09 09:37:40 [2590] [1] 202 Lines processed
2015-12-09 09:37:40 [2590] [1] /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.service-PID-2590 deleted
2015-12-09 09:37:40 [2590] [1] PNP exiting (runtime 0.583946s) ...

Code: Select all

# tail -100 /usr/local/nagios/var/npcd.log
[12-09-2015 09:37:23] NPCD: ThreadCounter 0/5 File is 1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: Regular File: 1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:23] NPCD: Processing file 1449617835.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617835.perfdata.host
[12-09-2015 09:37:23] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:23] NPCD: Processing file '1449617835.perfdata.host'
[12-09-2015 09:37:23] NPCD: ThreadCounter 1/5 File is 1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: Regular File: 1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:23] NPCD: Processing file 1449617835.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617835.perfdata.service
[12-09-2015 09:37:23] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:23] NPCD: Processing file '1449617835.perfdata.service'
[12-09-2015 09:37:25] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:37:40] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: ThreadCounter 0/5 File is 1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: Regular File: 1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:40] NPCD: Processing file 1449617851.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.host
[12-09-2015 09:37:40] NPCD: DEBUG: load 0.160000/20.000000
[12-09-2015 09:37:40] NPCD: Processing file '1449617851.perfdata.host'
[12-09-2015 09:37:40] NPCD: ThreadCounter 1/5 File is 1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: Regular File: 1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:40] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:40] NPCD: Processing file 1449617851.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617851.perfdata.service
[12-09-2015 09:37:40] NPCD: Processing file '1449617851.perfdata.service'
[12-09-2015 09:37:42] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:37:57] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: ThreadCounter 0/5 File is 1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: Regular File: 1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:37:57] NPCD: DEBUG: load 0.130000/20.000000
[12-09-2015 09:37:57] NPCD: Processing file 1449617865.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617865.perfdata.host
[12-09-2015 09:37:57] NPCD: ThreadCounter 1/5 File is 1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: Processing file '1449617865.perfdata.host'
[12-09-2015 09:37:57] NPCD: Regular File: 1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:37:57] NPCD: Processing file 1449617865.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617865.perfdata.service
[12-09-2015 09:37:57] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:37:57] NPCD: Processing file '1449617865.perfdata.service'
[12-09-2015 09:38:00] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:38:15] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: ThreadCounter 0/5 File is 1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: Regular File: 1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:38:15] NPCD: DEBUG: load 0.150000/20.000000
[12-09-2015 09:38:15] NPCD: Processing file 1449617880.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617880.perfdata.host
[12-09-2015 09:38:15] NPCD: ThreadCounter 1/5 File is 1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: Processing file '1449617880.perfdata.host'
[12-09-2015 09:38:15] NPCD: Regular File: 1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:38:15] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-09-2015 09:38:15] NPCD: Processing file 1449617881.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617881.perfdata.service
[12-09-2015 09:38:15] NPCD: Processing file '1449617881.perfdata.service'
[12-09-2015 09:38:17] NPCD: No more files to process... waiting for 15 seconds
[12-09-2015 09:38:32] NPCD: Found 6 files in /var/nagiosramdisk/spool/perfdata/
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is .
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is ..
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 0/5 File is 1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: Regular File: 1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 0
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: Processing file 1449617895.perfdata.host with ID 140374991832832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617895.perfdata.host
[12-09-2015 09:38:32] NPCD: ThreadCounter 1/5 File is 1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617895.perfdata.host'
[12-09-2015 09:38:32] NPCD: Regular File: 1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 1
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: ThreadCounter 2/5 File is 1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: Regular File: 1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 2
[12-09-2015 09:38:32] NPCD: Processing file 1449617911.perfdata.host with ID 140374899160832 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617911.perfdata.host
[12-09-2015 09:38:32] NPCD: Processing file 1449617895.perfdata.service with ID 140374981342976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617895.perfdata.service
[12-09-2015 09:38:32] NPCD: DEBUG: load 0.120000/20.000000
[12-09-2015 09:38:32] NPCD: Processing file '1449617895.perfdata.service'
[12-09-2015 09:38:32] NPCD: ThreadCounter 3/5 File is 1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617911.perfdata.host'
[12-09-2015 09:38:32] NPCD: Regular File: 1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: A thread was started on thread_counter = 3
[12-09-2015 09:38:32] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[12-09-2015 09:38:32] NPCD: Processing file 1449617911.perfdata.service with ID 140374888670976 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1449617911.perfdata.service
[12-09-2015 09:38:32] NPCD: Processing file '1449617911.perfdata.service'

Code: Select all

# uptime
 09:39:07 up 27 days, 22:13,  1 user,  load average: 0.31, 0.21, 0.18

Code: Select all

# ps -ef | grep perf
nagios   16625 16616  0 09:39 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios   16630 16625  0 09:39 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
root     19447 28876  0 09:39 pts/0    00:00:00 grep perf

BTW - I don't think I've ever seen the host & service perfdata files get as large as the ones that couln't be deleted
This is a snapshot of their normal sizes before they get processed.

Code: Select all

# ls -l /var/nagiosramdisk/spool/perfdata
total 96
-rw-r--r-- 1 nagios nagios 10361 Dec  9 09:50 1449618645.perfdata.host
-rw-r--r-- 1 nagios nagios 82775 Dec  9 09:50 1449618645.perfdata.service

Re: RAMDISK full

Posted: Wed Dec 09, 2015 2:09 pm
by rkennedy
Is increasing your ramdisk an option? After looking at our ram disk document I believe yours could be increased to 500M as recommended.
https://assets.nagios.com/downloads/nag ... giosXI.pdf

Can you also post the result of the following command? -

Code: Select all

top|head -5

Re: RAMDISK full

Posted: Tue Dec 15, 2015 6:58 pm
by Fred Kroeger
Ahhh.... you've updated the RAMDisk doco! I have a standard Nagios installation based on the old doco that recommended 50M which I didn't think was adequate, so I have always configured them for 100M
I'll review the doco and change the sizes accordingly.

Code: Select all

top - 09:54:02 up 34 days, 22:28,  1 user,  load average: 0.00, 0.03, 0.06
Tasks: 285 total,   4 running, 280 sleeping,   0 stopped,   1 zombie
Cpu(s): 10.9%us,  4.7%sy,  0.0%ni, 83.4%id,  0.6%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:   3923728k total,  2644752k used,  1278976k free,   140748k buffers
Swap:  2064380k total,   189196k used,  1875184k free,  1618492k cached
BTW - the load average is pretty low because I use a Mod-Gearman worker

Regards Fred

Re: RAMDISK full

Posted: Wed Dec 16, 2015 10:33 am
by rkennedy
That is a pretty low load!

Let us know if upgrading your ramdisk works to resolve this!

Re: RAMDISK full

Posted: Wed Dec 16, 2015 7:54 pm
by Fred Kroeger
Yes load is impressive! It's amazing what difference adding a Mod-Gearman worker can make.
I have incresaed the RAMDisk to 500MB - so will wait & see if the problem re-occurs.

Code: Select all

top - 10:52:11 up 15:38,  1 user,  load average: 0.37, 0.67, 0.84
Tasks: 170 total,   1 running, 168 sleeping,   0 stopped,   1 zombie
Cpu(s):  5.0%us,  2.4%sy,  0.0%ni, 91.7%id,  0.7%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3923728k total,  3073248k used,   850480k free,   155772k buffers
Swap:  2064380k total,     6820k used,  2057560k free,  2096352k cached

Code: Select all

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       27G  8.7G   17G  34% /
tmpfs                 1.9G  1.0M  1.9G   1% /dev/shm
/dev/mapper/VolGroup-lv_app
                       50G  6.3G   41G  14% /usr/local
/dev/sda1             477M   66M  386M  15% /boot
tmpfs                 500M   11M  490M   3% /var/nagiosramdisk

Re: RAMDISK full

Posted: Thu Dec 17, 2015 10:59 am
by hsmith
Let us know if it comes back. Thanks!

Re: RAMDISK full

Posted: Fri Jan 08, 2016 1:26 am
by Fred Kroeger
Happened again on a different server this time. As this is my main server , I had previously increased the size to 500MB after the previous discussions.
Same issue as before, where there were two open perfdata files (host & service) that were showing as deleted (using lsof command). It was the Nagios process that was holding them open still.
Quick fix is to stop/start the nagios service which closes the file and completes the deletion. However this is definitely something that has been introduced since NagiosXI 5.2 as I have not experienced this problem with the older versions and now on different servers.

regards... Fred

Re: RAMDISK full

Posted: Fri Jan 08, 2016 11:28 am
by rkennedy
Odd - I'd like to gather a bit more information.

Just to clarify, this occurred on a machine that wasn't increased to a 500M ramdisk right?

What is the result of top|head -5 on this other machine?

How many hosts / service checks are running on it at this point on it?