Nagios RAMDISK 100% Two Weekends In A Row

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
luczynj
Posts: 88
Joined: Wed Dec 03, 2014 6:47 pm

Nagios RAMDISK 100% Two Weekends In A Row

Post by luczynj »

Hello support!

We've had our 1.5GB Nagios /var/nagiosramdisk hit 100% capacilty two weekends in a row now on both our primary and backup systems.

How can I determine what caused it?
How can I prevent it happening again?
Can we increase the size of it?

Running 5.7.1 on our backup server
Running 5.6.5 on our primary server

Help! Please! You're our only hope.

Regards,
JL
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios RAMDISK 100% Two Weekends In A Row

Post by cdienger »

It sounds like it is getting into a state where it isn't processing performance data. How full is it normally? I would check out the npcd.log and perfdata.log under /usr/local/nagios/var/ for any errors.

Is there a specific folder in /var/nagiosramdisk that is taking up the space?

I've attached a flowchart showing the steps taken to process perfdata to help identify where things may be breaking down.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
luczynj
Posts: 88
Joined: Wed Dec 03, 2014 6:47 pm

Re: Nagios RAMDISK 100% Two Weekends In A Row

Post by luczynj »

Hello,

Thanks for that. At what point are the files located in /var/nagiosramdisk/spool/checkresults processed?

They seem to be created by this script, /usr/local/nrdp/server/plugins/nagioscorepassivecheck/nagioscorepassivecheck.inc.php, upon receiption of the NRDP data/messages.

When and how are these files processed/cleaned up? Is it possible to reduce the amount of time of between the cleanup jobs of the files in /var/nagiosramdisk?

-rwxrwx--- 1 nagios nagios 378 Jul 1 13:37 cDVdg7U
-rw-r--r-- 1 nagios nagios 0 Jul 1 13:37 cDVdg7U.ok

We know which NRDP services are causing this. They are new services we would like to monitor, but it appears the perfdata can't keep up with it, so once I disabled it, the system would settle down.

What is the role of objects.cache?

Can we increase the size of the ram disk, or is 1.5GB the limit.

Regards,
JL
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios RAMDISK 100% Two Weekends In A Row

Post by benjaminsmith »

Hi,

I'd like to get a copy of a current system profile as it's likely the files are spooling for processing performance data, and that's why the RAMDISK is filling up. Also, can you let me know which NRPD services are causing the issue, so I can check the logs.

Most of these backed processes run as crong jobs, so make sure the cron service is running on the server.

Code: Select all

systemctl status crond
This file objects.cache file contains all of the object configuration data forNagios, and is updated upon a restart of the Nagios process.

Yes. You can increase the size of the RAMDISK, directions are on page 6 of the following guide ( change RAMDISK_SIZE= new-value ).

Utilizing a RAM Disk in Nagios XI

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
luczynj
Posts: 88
Joined: Wed Dec 03, 2014 6:47 pm

Re: Nagios RAMDISK 100% Two Weekends In A Row

Post by luczynj »

Hi Ben,

See attached profile.zip.

I don't have the command systemctl on my server, but cron is running. Which cron job cleans up the /var/nagiosramdisk?

The services that are causing this issue are all sent using NRDP for hosts:
GMV_TES_AMS
GMV_TES_DAL
GMV_TES_LGS
GMV_TES_SIN

They have ASR or ERLANG or MHT at the end of the service name. Here are some samples.

GMV_TES_LGS.cfg: service_description SPVU3ZI-SPVU3ZO ASR
GMV_TES_LGS.cfg: service_description SPVUKRI-SPVUKRO ASR
GMV_TES_LGS.cfg: service_description SPVUKZI-SPVUKZO ASR
GMV_TES_LGS.cfg: service_description SWGEAMI-SWGEAMO ASR

GMV_TES_AMS.cfg: service_description SPVLUZI-SPVLUZO ERLANG
GMV_TES_AMS.cfg: service_description SPVNLRI-SPVNLRO ERLANG
GMV_TES_AMS.cfg: service_description SPVNLZI-SPVNLZO ERLANG
GMV_TES_AMS.cfg: service_description SPVSARI-SPVSARO ERLANG

GMV_TES_SIN.cfg: service_description VZNMIAI-VZNMIAO MHT
GMV_TES_SIN.cfg: service_description VZNNUTI-VZNNUTO MHT
GMV_TES_SIN.cfg: service_description WRTCCHI-WRTCCHO MHT

We wanted to send the NRDP stats every 15 minutes and have just changed it to every 30 minutes, and the problem is still happening.

We're on 5.7.1 on both our servers.

Thank you.

Regards,
JL

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios RAMDISK 100% Two Weekends In A Row

Post by lmiltchev »

It is possible that the perfdata files in the "/var/nagiosramdisk/spool/checkresults/" directory are not being processed because of permission issues. Run the following command to check the permissions on the "checkresults" directory:

Code: Select all

ls -lad /var/nagiosramdisk/spool/checkresults/
If it is not "writable" by group, run the following command to fix the permissions:

Code: Select all

chmod -R 770 /var/nagiosramdisk/spool/checkresults
and restart apache:

Code: Select all

service httpd restart
Can you run the following commands and show the output?

Code: Select all

ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
grep nag /etc/group
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked