Page 1 of 1
Nagios RAMDISK 100% Two Weekends In A Row
Posted: Sat Jun 27, 2020 10:36 am
by luczynj
Hello support!
We've had our 1.5GB Nagios /var/nagiosramdisk hit 100% capacilty two weekends in a row now on both our primary and backup systems.
How can I determine what caused it?
How can I prevent it happening again?
Can we increase the size of it?
Running 5.7.1 on our backup server
Running 5.6.5 on our primary server
Help! Please! You're our only hope.
Regards,
JL
Re: Nagios RAMDISK 100% Two Weekends In A Row
Posted: Mon Jun 29, 2020 12:05 pm
by cdienger
It sounds like it is getting into a state where it isn't processing performance data. How full is it normally? I would check out the npcd.log and perfdata.log under /usr/local/nagios/var/ for any errors.
Is there a specific folder in /var/nagiosramdisk that is taking up the space?
I've attached a flowchart showing the steps taken to process perfdata to help identify where things may be breaking down.
Re: Nagios RAMDISK 100% Two Weekends In A Row
Posted: Wed Jul 01, 2020 9:10 am
by luczynj
Hello,
Thanks for that. At what point are the files located in /var/nagiosramdisk/spool/checkresults processed?
They seem to be created by this script, /usr/local/nrdp/server/plugins/nagioscorepassivecheck/nagioscorepassivecheck.inc.php, upon receiption of the NRDP data/messages.
When and how are these files processed/cleaned up? Is it possible to reduce the amount of time of between the cleanup jobs of the files in /var/nagiosramdisk?
-rwxrwx--- 1 nagios nagios 378 Jul 1 13:37 cDVdg7U
-rw-r--r-- 1 nagios nagios 0 Jul 1 13:37 cDVdg7U.ok
We know which NRDP services are causing this. They are new services we would like to monitor, but it appears the perfdata can't keep up with it, so once I disabled it, the system would settle down.
What is the role of objects.cache?
Can we increase the size of the ram disk, or is 1.5GB the limit.
Regards,
JL
Re: Nagios RAMDISK 100% Two Weekends In A Row
Posted: Thu Jul 02, 2020 11:13 am
by benjaminsmith
Hi,
I'd like to get a copy of a current system profile as it's likely the files are spooling for processing performance data, and that's why the RAMDISK is filling up. Also, can you let me know which NRPD services are causing the issue, so I can check the logs.
Most of these backed processes run as crong jobs, so make sure the cron service is running on the server.
This file
objects.cache file contains all of the object configuration data forNagios, and is updated upon a restart of the Nagios process.
Yes. You can increase the size of the RAMDISK, directions are on page 6 of the following guide ( change
RAMDISK_SIZE= new-value ).
Utilizing a RAM Disk in Nagios XI
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Re: Nagios RAMDISK 100% Two Weekends In A Row
Posted: Thu Jul 02, 2020 3:15 pm
by luczynj
Hi Ben,
See attached profile.zip.
I don't have the command systemctl on my server, but cron is running. Which cron job cleans up the /var/nagiosramdisk?
The services that are causing this issue are all sent using NRDP for hosts:
GMV_TES_AMS
GMV_TES_DAL
GMV_TES_LGS
GMV_TES_SIN
They have ASR or ERLANG or MHT at the end of the service name. Here are some samples.
GMV_TES_LGS.cfg: service_description SPVU3ZI-SPVU3ZO ASR
GMV_TES_LGS.cfg: service_description SPVUKRI-SPVUKRO ASR
GMV_TES_LGS.cfg: service_description SPVUKZI-SPVUKZO ASR
GMV_TES_LGS.cfg: service_description SWGEAMI-SWGEAMO ASR
GMV_TES_AMS.cfg: service_description SPVLUZI-SPVLUZO ERLANG
GMV_TES_AMS.cfg: service_description SPVNLRI-SPVNLRO ERLANG
GMV_TES_AMS.cfg: service_description SPVNLZI-SPVNLZO ERLANG
GMV_TES_AMS.cfg: service_description SPVSARI-SPVSARO ERLANG
GMV_TES_SIN.cfg: service_description VZNMIAI-VZNMIAO MHT
GMV_TES_SIN.cfg: service_description VZNNUTI-VZNNUTO MHT
GMV_TES_SIN.cfg: service_description WRTCCHI-WRTCCHO MHT
We wanted to send the NRDP stats every 15 minutes and have just changed it to every 30 minutes, and the problem is still happening.
We're on 5.7.1 on both our servers.
Thank you.
Regards,
JL
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
Re: Nagios RAMDISK 100% Two Weekends In A Row
Posted: Mon Jul 06, 2020 10:50 am
by lmiltchev
It is possible that the perfdata files in the "/var/nagiosramdisk/spool/checkresults/" directory are not being processed because of permission issues. Run the following command to check the permissions on the "checkresults" directory:
Code: Select all
ls -lad /var/nagiosramdisk/spool/checkresults/
If it is not "writable" by group, run the following command to fix the permissions:
Code: Select all
chmod -R 770 /var/nagiosramdisk/spool/checkresults
and restart apache:
Can you run the following commands and show the output?
Code: Select all
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
grep nag /etc/group