Page 3 of 5
Re: Nagios ramdisk full and no performance graphs
Posted: Thu Nov 11, 2021 1:12 pm
by hbouma
I understand that with Carbon Black disabled the Performance Data is coming across now. What is the current status of the RAMDISK?
The Carbon Black service is shut down and Nagios RamDisk shows active. However, the host-perfdata and service-perfdata are growing until the mount is 100% full.
Code: Select all
-rw-r--r-- 1 nagios nagios 104M Nov 11 12:47 host-perfdata
-rw-r--r-- 1 nagios nagios 2.6M Nov 11 08:46 objects.cache
-rw-r--r-- 1 nagios nagios 395M Nov 11 12:47 service-perfdata
Some of the checks are not displaying Performance Data, is this consistent or random?
Up until the attempt to restart Nagios Ramdisk at 11:30ish on 11/08, some of the performance graphs started working for a while, but then stopped working again.
2021-11-11 12_52_42-Nagios XI and 11 more pages - Work - Microsoft Edge.png
Please verify that the logging config is correct in '/usr/local/nagios/etc/pnp/process_perfdata.cfg' find out where it is writing to:
Logging looks correct.
Code: Select all
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 2
Re: Nagios ramdisk full and no performance graphs
Posted: Fri Nov 12, 2021 9:50 am
by pbroste
Hello
@hbouma
Thanks for following up with the details, you pointed out that the follow keep on filling up until full. That would indicate that the "move" (mv) is not actually able to move them or the mv function is not running or functioning. Wonder if they are stuck?
The Carbon Black service is shut down and Nagios RamDisk shows active. However, the host-perfdata and service-perfdata are growing until the mount is 100% full.
Code: Select all
-rw-r--r-- 1 nagios nagios 104M Nov 11 12:47 host-perfdata
-rw-r--r-- 1 nagios nagios 2.6M Nov 11 08:46 objects.cache
-rw-r--r-- 1 nagios nagios 395M Nov 11 12:47 service-perfdata
Want to have you run a manual mv command and have you look at the results on the command itself and then run a 'ls' on the directories to see if they made the move.
Code: Select all
/bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
*
-v for command feedback
Please let us know what you see on your end so we can determine the next steps,
Perry
Re: Nagios ramdisk full and no performance graphs
Posted: Fri Nov 12, 2021 10:18 am
by hbouma
Code: Select all
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
‘/var/nagiosramdisk/host-perfdata’ -> ‘/var/nagiosramdisk/spool/xidpe/$.perfdata.host’
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/host-perfdata’: No such file or directory
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv: cannot stat ‘/var/nagiosramdisk/host-perfdata’: No such file or directory
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/host-perfdata’: No such file or directory
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
‘/var/nagiosramdisk/service-perfdata’ -> ‘/var/nagiosramdisk/spool/xidpe/$.perfdata.service’
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
‘/usr/local/nagios/var/service-perfdata’ -> ‘/usr/local/nagios/var/spool/perfdata/service-perfdata.$’
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
/bin/mv: cannot stat ‘/var/nagiosramdisk/service-perfdata’: No such file or directory
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/service-perfdata’: No such file or directory
The drive is still 100% full
tmpfs 500M 500M 0 100% /var/nagiosramdisk
It appears some of the files are still in use by our Nagios program
lsof | egrep "deleted|COMMMAND" | grep nagios
nagios 3123 nagios 24w REG 0,42 108400640 61187 /var/nagiosramdisk/host-perfdata (deleted)
nagios 3123 nagios 38w REG 0,42 413179904 61189 /var/nagiosramdisk/spool/perfdata/$.perfdata.service-PID-6080 (deleted)
ps -ef | grep 3123
nagios 3123 3007 0 09:30 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Nagios ramdisk full and no performance graphs
Posted: Fri Nov 12, 2021 5:50 pm
by ssax
I don't see rrdcached running but I see you have it configured.
First, I'd free up some space in your ramdisk so that things can process properly.
If you really want to use rrdcached, follow this guide again:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Make sure rrdcached is enabled and running:
Code: Select all
systemctl status rrdcached
systemctl restart rrdcached
systemctl enable rrdcached
Then edit this file:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Make sure your load_threshold is using a decimal:
Then restart npcd:
NOTE: I've seen some customers have issues with rrdcached, to disable it entirely, edit this file:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Comment out this line at the bottom:
Code: Select all
#RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock
If all that doesn't fix it, please PM me a FRESH copy of your profile so I can see the latest logs/configs.
Re: Nagios ramdisk full and no performance graphs
Posted: Tue Nov 16, 2021 8:41 am
by hbouma
After installing rrdcached, the nagiosramdisk mount is no longer full. However, the performance graphs are still not updating. I am sending you the new profile via PM.
ssax wrote:
Then edit this file:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Make sure your load_threshold is using a decimal:
My file has no load_threshold variable at all.
Code: Select all
#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 15
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
#
#
#
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 2
#
# XML encoding
# The supported encodings are ISO-8859-1, UTF-8 and US-ASCII.
# http://www.php.net/xml-parser-create
XML_ENC = UTF-8
#
# EXPERIMENTAL rrdcached Support
# Use only with rrdtool svn revision 1511+
#
# RRD_DAEMON_OPTS = unix:/tmp/rrdcached.sock
RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock
RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock
Re: Nagios ramdisk full and no performance graphs
Posted: Tue Nov 16, 2021 5:11 pm
by ssax
Sorry, wrong file, this one:
Code: Select all
/usr/local/nagios/etc/pnp/npcd.cfg
I see it using 100.0 now in the logs though so you must've updated it.
Code: Select all
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Timeout after 15 secs. ***
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Please check your npcd.cfg
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//$.perfdata.service-PID-6080 deleted
2021-11-12 10:10:49 [6080] [0] *** Timeout while processing Host: "preprod-XXXXXXXX" Service: "CCCCCCC"
In this file:
Code: Select all
/usr/local/nagios/etc/pnp/process_perfdata.cfg
Set your TIMEOUT to 40 (no need to restart anything).
What is the output of these commands:
Code: Select all
ls -ld /var/nagiosramdisk
ls -lh /var/nagiosramdisk/
ls -lh /usr/local/nagios/var
Re: Nagios ramdisk full and no performance graphs
Posted: Thu Nov 18, 2021 9:52 am
by hbouma
The TIMEOUT is updated to 40.
ssax wrote:
What is the output of these commands:
Code: Select all
ls -ld /var/nagiosramdisk
ls -lh /var/nagiosramdisk/
ls -lh /usr/local/nagios/var
Code: Select all
$ ls -ld /var/nagiosramdisk
drwxrwxrwt 4 nagios nagios 160 Nov 18 09:49 /var/nagiosramdisk
09:49 AM SERVER root [~]
$ ls -lh /var/nagiosramdisk/
total 500M
-rw-r--r-- 1 nagios nagios 104M Nov 18 09:49 host-perfdata
-rw-r--r-- 1 nagios nagios 2.6M Nov 18 04:35 objects.cache
-rw-r--r-- 1 nagios nagios 391M Nov 18 09:49 service-perfdata
drwxr-xr-x 6 nagios nagios 120 Nov 16 07:46 spool
-rw-r--r-- 1 nagios nagios 3.5M Nov 18 09:49 status.dat
drwxr-xr-x 3 nagios nagios 60 Nov 18 04:35 tmp
09:49 AM SERVER root [~]
$ ls -lh /usr/local/nagios/var
total 8.3M
drwxrwxr-x. 2 nagios nagios 44K Nov 18 00:00 archives
-rw-rw-rw- 1 root root 123 Sep 22 2020 Nagios.host.java.config.ser
-rw-r--r-- 1 nagios nagios 505K Nov 18 09:43 nagios.log
-rw------- 1 nagios nagios 0 Jun 5 2019 nagios.tmp5qro9N
-rw-rw-r-- 1 nagios nagios 462K Oct 17 2019 nagios.tmpWKNETC
-rw-r--r-- 1 nagios nagios 1.7K Apr 12 2021 ndomod.tmp
-rw-r--r--. 1 nagios nagios 19K Nov 16 07:56 npcd.log
-rw-r--r--. 1 nagios nagios 32K Mar 29 2019 objects.cache
-rw-rw-r-- 1 nagios nagios 3.8M Nov 12 10:10 perfdata.log
-rw-rw-rw- 1 root root 483 Sep 22 2020 profile.csv
-rw------- 1 nagios nagios 3.5M Nov 18 09:35 retention.dat
drwxrws---. 2 nagios nagcmd 4.0K Nov 18 04:35 rw
drwxr-xr-x. 5 nagios nagios 4.0K Mar 29 2019 spool
drwxr-xr-x. 2 nagios nagios 4.0K Nov 12 10:10 stats
Oddly, our /var/nagiosramdisk started filling up again. I validated that the Carbon Black instance is not running, as well as the fact that RamDisk is running and RRDCACHED is running.
I have restarted the ramdisk service. Oddly, everytime the ramdisk is restarted, the /var/nagiosramdisk/spool/xidpe folder is missing and requires me to manually remake it.
Re: Nagios ramdisk full and no performance graphs
Posted: Thu Nov 18, 2021 6:47 pm
by ssax
Please PM any of these files you have:
Code: Select all
/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/ramdisk.service
/usr/lib/systemd/system/nagios.service
/etc/init.d/nagios
/var/nagiosramdisk/objects.cache
/var/nagiosramdisk/status.dat
Re: Nagios ramdisk full and no performance graphs
Posted: Fri Nov 19, 2021 9:39 am
by hbouma
The following do not exist:
/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/nagios.service
The others are being sent over PM.
Re: Nagios ramdisk full and no performance graphs
Posted: Mon Nov 22, 2021 11:39 am
by pbroste
Hello
@hbouma
I am following up on the behalf of
@ssax as he is out of the office this week. Doing some catching up on this forum post issue and see that
@ssax requested the following info. We do not have access to
@ssax private message inbox and will not be able to review.
ssax wrote:Please PM any of these files you have:
Code: Select all
/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/ramdisk.service
/usr/lib/systemd/system/nagios.service
/etc/init.d/nagios
/var/nagiosramdisk/objects.cache
/var/nagiosramdisk/status.dat
Looking back through we see that you stated that the following directory is not created when the 'ramdisk.service' is started and wanted to touch on this further, by taking a look at the journal output on restarted service.
First restart the 'ramdisk.service' by:
Then take a look at the 'systemctl' status to see if the '/var/nagiosramdisk/spool/xidpe' folder is create or missed?
Code: Select all
systemctl -l status ramdisk --no-pager
Results should look similar to this example:
systemctl -l status ramdisk --no-pager
● ramdisk.service - Ramdisk
Loaded: loaded (/usr/lib/systemd/system/ramdisk.service; enabled; vendor preset: disabled)
Active: active (exited) since Mon 2021-11-22 10:15:05 CST; 17min ago
Process: 2049696 ExecStart=/usr/bin/chown -R nagios:nagios /var/nagiosramdisk (code=exited, status=0/SUCCESS)
Process: 2049694 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidpe /var/nagiosramdisk/spool/perfdata (code=exited, status=0/SUCCESS)
Process: 2049692 ExecStartPre=/usr/bin/mount -t tmpfs -o size=100m tmpfs /var/nagiosramdisk (code=exited, status=0/SUCCESS)
Process: 2049690 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidpe /var/nagiosramdisk/spool/perfdata (code=exited, status=0/SUCCESS)
Main PID: 2049696 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 101104)
Memory: 0B
CGroup: /system.slice/ramdisk.service
Also, let us know how 'rrdcached.service' looks as well.
Code: Select all
systemctl -l status rrdcached --no-pager
Thanks,
Perry