Nagios ramdisk full and no performance graphs

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

I understand that with Carbon Black disabled the Performance Data is coming across now. What is the current status of the RAMDISK?
The Carbon Black service is shut down and Nagios RamDisk shows active. However, the host-perfdata and service-perfdata are growing until the mount is 100% full.

Code: Select all

-rw-r--r--   1 nagios nagios 104M Nov 11 12:47 host-perfdata
-rw-r--r--   1 nagios nagios 2.6M Nov 11 08:46 objects.cache
-rw-r--r--   1 nagios nagios 395M Nov 11 12:47 service-perfdata
Some of the checks are not displaying Performance Data, is this consistent or random?
Up until the attempt to restart Nagios Ramdisk at 11:30ish on 11/08, some of the performance graphs started working for a while, but then stopped working again.
2021-11-11 12_52_42-Nagios XI and 11 more pages - Work - Microsoft​ Edge.png
Please verify that the logging config is correct in '/usr/local/nagios/etc/pnp/process_perfdata.cfg' find out where it is writing to:
Logging looks correct.

Code: Select all

LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 2
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma

Thanks for following up with the details, you pointed out that the follow keep on filling up until full. That would indicate that the "move" (mv) is not actually able to move them or the mv function is not running or functioning. Wonder if they are stuck?
The Carbon Black service is shut down and Nagios RamDisk shows active. However, the host-perfdata and service-perfdata are growing until the mount is 100% full.

Code: Select all

-rw-r--r--   1 nagios nagios 104M Nov 11 12:47 host-perfdata
-rw-r--r--   1 nagios nagios 2.6M Nov 11 08:46 objects.cache
-rw-r--r--   1 nagios nagios 395M Nov 11 12:47 service-perfdata
Want to have you run a manual mv command and have you look at the results on the command itself and then run a 'ls' on the directories to see if they made the move.

Code: Select all

/bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
*-v for command feedback

Please let us know what you see on your end so we can determine the next steps,
Perry
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

Code: Select all

10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
‘/var/nagiosramdisk/host-perfdata’ -> ‘/var/nagiosramdisk/spool/xidpe/$.perfdata.host’
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/host-perfdata’: No such file or directory
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
/bin/mv: cannot stat ‘/var/nagiosramdisk/host-perfdata’: No such file or directory
10:08 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/host-perfdata’: No such file or directory


10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
‘/var/nagiosramdisk/service-perfdata’ -> ‘/var/nagiosramdisk/spool/xidpe/$.perfdata.service’
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
‘/usr/local/nagios/var/service-perfdata’ -> ‘/usr/local/nagios/var/spool/perfdata/service-perfdata.$’
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
/bin/mv: cannot stat ‘/var/nagiosramdisk/service-perfdata’: No such file or directory
10:09 AM Server root [/var/nagiosramdisk]
$ /bin/mv -v /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
/bin/mv: cannot stat ‘/usr/local/nagios/var/service-perfdata’: No such file or directory
The drive is still 100% full
tmpfs 500M 500M 0 100% /var/nagiosramdisk

It appears some of the files are still in use by our Nagios program
lsof | egrep "deleted|COMMMAND" | grep nagios
nagios 3123 nagios 24w REG 0,42 108400640 61187 /var/nagiosramdisk/host-perfdata (deleted)
nagios 3123 nagios 38w REG 0,42 413179904 61189 /var/nagiosramdisk/spool/perfdata/$.perfdata.service-PID-6080 (deleted)

ps -ef | grep 3123
nagios 3123 3007 0 09:30 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios ramdisk full and no performance graphs

Post by ssax »

I don't see rrdcached running but I see you have it configured.

First, I'd free up some space in your ramdisk so that things can process properly.

If you really want to use rrdcached, follow this guide again:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Make sure rrdcached is enabled and running:

Code: Select all

systemctl status rrdcached
systemctl restart rrdcached
systemctl enable rrdcached
Then edit this file:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Make sure your load_threshold is using a decimal:

Code: Select all

load_threshold = 60.0
Then restart npcd:

Code: Select all

systemctl restart npcd

NOTE: I've seen some customers have issues with rrdcached, to disable it entirely, edit this file:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Comment out this line at the bottom:

Code: Select all

#RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock

If all that doesn't fix it, please PM me a FRESH copy of your profile so I can see the latest logs/configs.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

After installing rrdcached, the nagiosramdisk mount is no longer full. However, the performance graphs are still not updating. I am sending you the new profile via PM.
ssax wrote: Then edit this file:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Make sure your load_threshold is using a decimal:

Code: Select all

load_threshold = 60.0
My file has no load_threshold variable at all.

Code: Select all

#
# Config File for process_perfdata.pl
#
# $Id: process_perfdata.cfg-sample.in 520 2008-09-16 12:50:10Z pitchfork $
#
# process_perfdata.pl Timout
#
TIMEOUT = 15
#
# Use RRDs Perl Module
#
USE_RRDs = 1
#
#
#
RRDPATH = /usr/local/nagios/share/perfdata
#
#
#
RRDTOOL = /bin/rrdtool
#
#
#
CFG_DIR = /usr/local/nagios/etc/pnp
#
#
#
RRD_HEARTBEAT = 8460
#
#
#
RRA_CFG = /usr/local/nagios/etc/pnp/rra.cfg
#
#
#
RRA_STEP = 60
#
#
#
LOG_FILE = /usr/local/nagios/var/perfdata.log
#
# Loglevel 0=silent 1=normal 2=debug
#
LOG_LEVEL = 2
#
# XML encoding
# The supported encodings are ISO-8859-1, UTF-8 and US-ASCII.
# http://www.php.net/xml-parser-create
XML_ENC = UTF-8
#
# EXPERIMENTAL rrdcached Support
# Use only with rrdtool svn revision 1511+
#
# RRD_DAEMON_OPTS = unix:/tmp/rrdcached.sock
RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock
RRD_DAEMON_OPTS = unix:/var/rrdtool/rrdcached/rrdcached.sock
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios ramdisk full and no performance graphs

Post by ssax »

Sorry, wrong file, this one:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
I see it using 100.0 now in the logs though so you must've updated it.

Code: Select all

2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Timeout after 15 secs. ***
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: Please check your npcd.cfg
2021-11-12 10:10:49 [6080] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//$.perfdata.service-PID-6080 deleted
2021-11-12 10:10:49 [6080] [0] *** Timeout while processing Host: "preprod-XXXXXXXX" Service: "CCCCCCC"
In this file:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Set your TIMEOUT to 40 (no need to restart anything).

What is the output of these commands:

Code: Select all

ls -ld /var/nagiosramdisk
ls -lh /var/nagiosramdisk/
ls -lh /usr/local/nagios/var
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

The TIMEOUT is updated to 40.
ssax wrote:
What is the output of these commands:

Code: Select all

ls -ld /var/nagiosramdisk
ls -lh /var/nagiosramdisk/
ls -lh /usr/local/nagios/var

Code: Select all

$ ls -ld /var/nagiosramdisk
drwxrwxrwt 4 nagios nagios 160 Nov 18 09:49 /var/nagiosramdisk
09:49 AM SERVER root [~]
$ ls -lh /var/nagiosramdisk/
total 500M
-rw-r--r-- 1 nagios nagios 104M Nov 18 09:49 host-perfdata
-rw-r--r-- 1 nagios nagios 2.6M Nov 18 04:35 objects.cache
-rw-r--r-- 1 nagios nagios 391M Nov 18 09:49 service-perfdata
drwxr-xr-x 6 nagios nagios  120 Nov 16 07:46 spool
-rw-r--r-- 1 nagios nagios 3.5M Nov 18 09:49 status.dat
drwxr-xr-x 3 nagios nagios   60 Nov 18 04:35 tmp
09:49 AM SERVER root [~]
$ ls -lh /usr/local/nagios/var
total 8.3M
drwxrwxr-x. 2 nagios nagios  44K Nov 18 00:00 archives
-rw-rw-rw-  1 root   root    123 Sep 22  2020 Nagios.host.java.config.ser
-rw-r--r--  1 nagios nagios 505K Nov 18 09:43 nagios.log
-rw-------  1 nagios nagios    0 Jun  5  2019 nagios.tmp5qro9N
-rw-rw-r--  1 nagios nagios 462K Oct 17  2019 nagios.tmpWKNETC
-rw-r--r--  1 nagios nagios 1.7K Apr 12  2021 ndomod.tmp
-rw-r--r--. 1 nagios nagios  19K Nov 16 07:56 npcd.log
-rw-r--r--. 1 nagios nagios  32K Mar 29  2019 objects.cache
-rw-rw-r--  1 nagios nagios 3.8M Nov 12 10:10 perfdata.log
-rw-rw-rw-  1 root   root    483 Sep 22  2020 profile.csv
-rw-------  1 nagios nagios 3.5M Nov 18 09:35 retention.dat
drwxrws---. 2 nagios nagcmd 4.0K Nov 18 04:35 rw
drwxr-xr-x. 5 nagios nagios 4.0K Mar 29  2019 spool
drwxr-xr-x. 2 nagios nagios 4.0K Nov 12 10:10 stats
Oddly, our /var/nagiosramdisk started filling up again. I validated that the Carbon Black instance is not running, as well as the fact that RamDisk is running and RRDCACHED is running.

I have restarted the ramdisk service. Oddly, everytime the ramdisk is restarted, the /var/nagiosramdisk/spool/xidpe folder is missing and requires me to manually remake it.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios ramdisk full and no performance graphs

Post by ssax »

Please PM any of these files you have:

Code: Select all

/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/ramdisk.service
/usr/lib/systemd/system/nagios.service
/etc/init.d/nagios
/var/nagiosramdisk/objects.cache
/var/nagiosramdisk/status.dat
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Nagios ramdisk full and no performance graphs

Post by hbouma »

The following do not exist:

/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/nagios.service


The others are being sent over PM.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios ramdisk full and no performance graphs

Post by pbroste »

Hello @hbouma

I am following up on the behalf of @ssax as he is out of the office this week. Doing some catching up on this forum post issue and see that @ssax requested the following info. We do not have access to @ssax private message inbox and will not be able to review.
ssax wrote:Please PM any of these files you have:

Code: Select all

/etc/sysconfig/ramdisk
/etc/sysconfig/nagios
/usr/lib/systemd/system/ramdisk.service
/usr/lib/systemd/system/nagios.service
/etc/init.d/nagios
/var/nagiosramdisk/objects.cache
/var/nagiosramdisk/status.dat
Looking back through we see that you stated that the following directory is not created when the 'ramdisk.service' is started and wanted to touch on this further, by taking a look at the journal output on restarted service.

First restart the 'ramdisk.service' by:

Code: Select all

systemctl restart ramdisk.service
Then take a look at the 'systemctl' status to see if the '/var/nagiosramdisk/spool/xidpe' folder is create or missed?

Code: Select all

systemctl -l status ramdisk --no-pager
Results should look similar to this example:
systemctl -l status ramdisk --no-pager
● ramdisk.service - Ramdisk
Loaded: loaded (/usr/lib/systemd/system/ramdisk.service; enabled; vendor preset: disabled)
Active: active (exited) since Mon 2021-11-22 10:15:05 CST; 17min ago
Process: 2049696 ExecStart=/usr/bin/chown -R nagios:nagios /var/nagiosramdisk (code=exited, status=0/SUCCESS)
Process: 2049694 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidpe /var/nagiosramdisk/spool/perfdata (code=exited, status=0/SUCCESS)
Process: 2049692 ExecStartPre=/usr/bin/mount -t tmpfs -o size=100m tmpfs /var/nagiosramdisk (code=exited, status=0/SUCCESS)
Process: 2049690 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidpe /var/nagiosramdisk/spool/perfdata (code=exited, status=0/SUCCESS)
Main PID: 2049696 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 101104)
Memory: 0B
CGroup: /system.slice/ramdisk.service
Also, let us know how 'rrdcached.service' looks as well.

Code: Select all

systemctl -l status rrdcached --no-pager
Thanks,
Perry
Locked