Ramdisk full!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Ramdisk full!

Post by BanditBBS »

My XI is messed up due to the ramdisk being full. There are tons and tons of files in the spool/checkresults folder. I restarted the processes and those files all processed, but more keep coming in and also the spool/xidpe folder has many files in it. Doing a 'df' still shows 100% utilization also.

Should I wait and see what happens after a few more minutes?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Ramdisk full!

Post by abrist »

Maybe? Sounds like npcd/bulk processing may have hung up. Are the number of files in the perfdata/xidpe folders decreasing?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Ramdisk full!

Post by BanditBBS »

abrist wrote:Maybe? Sounds like npcd/bulk processing may have hung up. Are the number of files in the perfdata/xidpe folders decreasing?
Looks like the perfdata folder is growing in number....13,000+ now. Most new files coming in at 0 size
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Ramdisk full!

Post by BanditBBS »

Sorry, perfdata is empty, its the xidpe folder that is growing.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Ramdisk full!

Post by abrist »

Is npcd running? What do you see in the npcd/perfdata logs?

Code: Select all

service npcd status
tail -25 /usr/local/nagios/var/perfdata.log
tail -25 /usr/local/nagios/var/npcd.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Ramdisk full!

Post by BanditBBS »

Code: Select all

NPCD running (pid 2560).

Code: Select all

[root@svwdcnagios02 xidpe]# tail -25 /usr/local/nagios/var/perfdata.log
2013-11-14 13:06:25 [28732] [0] *** process_perfdata.pl terminated on signal ALRM
2013-11-22 14:34:47 [19201] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-11-22 14:34:47 [19201] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-11-22 14:34:47 [19201] [0] *** TIMEOUT: Please check your npcd.cfg
2013-11-22 14:34:47 [19201] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1385148866-PID-19201 deleted
2013-11-22 14:34:47 [19201] [0] *** Timeout while processing Host: "WDCAE-PVSP02P.AEO.AE.COM" Service: "All_Drives"
2013-11-22 14:34:47 [19201] [0] *** process_perfdata.pl terminated on signal ALRM
2013-11-22 15:01:11 [16018] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-11-22 15:01:11 [16018] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-11-22 15:01:11 [16018] [0] *** TIMEOUT: Please check your npcd.cfg
2013-11-22 15:01:11 [16018] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1385150446-PID-16018 deleted
2013-11-22 15:01:11 [16018] [0] *** Timeout while processing Host: "WDCAE-PWFM11P.AEO.AE.COM" Service: "All_Drives"
2013-11-22 15:01:11 [16018] [0] *** process_perfdata.pl terminated on signal ALRM
2013-11-22 16:28:49 [23559] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-11-22 16:28:49 [23559] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-11-22 16:28:49 [23559] [0] *** TIMEOUT: Please check your npcd.cfg
2013-11-22 16:28:49 [23559] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1385155719-PID-23559 deleted
2013-11-22 16:28:49 [23559] [0] *** Timeout while processing Host: "WDCAE-PJCC01V.AEO.AE.COM" Service: "All_Drives"
2013-11-22 16:28:49 [23559] [0] *** process_perfdata.pl terminated on signal ALRM
2013-12-01 08:09:38 [19092] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-12-01 08:09:38 [19092] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-12-01 08:09:38 [19092] [0] *** TIMEOUT: Please check your npcd.cfg
2013-12-01 08:09:38 [19092] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//host-perfdata.1385903364-PID-19092 deleted
2013-12-01 08:09:38 [19092] [0] *** Timeout while processing Host: "rp000002" Service: "_HOST_"
2013-12-01 08:09:38 [19092] [0] *** process_perfdata.pl terminated on signal ALRM

Code: Select all

[12-01-2013 08:09:38] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//host-perfdata.1385903364'
[12-01-2013 08:11:38] NPCD: WARN: MAX load reached: load 10.250000/10.000000 at i=0[12-01-2013 08:11:53] NPCD: WARN: MAX load reached: load 10.570000/10.000000 at i=1[12-01-2013 08:26:38] NPCD: WARN: MAX load reached: load 10.680000/10.000000 at i=0[12-01-2013 08:26:53] NPCD: WARN: MAX load reached: load 11.660000/10.000000 at i=1[12-01-2013 08:41:38] NPCD: WARN: MAX load reached: load 10.920000/10.000000 at i=0[12-01-2013 08:41:53] NPCD: WARN: MAX load reached: load 10.850000/10.000000 at i=1[12-01-2013 09:26:23] NPCD: WARN: MAX load reached: load 10.220000/10.000000 at i=0[12-01-2013 09:26:38] NPCD: WARN: MAX load reached: load 11.270000/10.000000 at i=1[12-01-2013 11:40:38] NPCD: WARN: MAX load reached: load 10.640000/10.000000 at i=0[12-01-2013 11:40:53] NPCD: WARN: MAX load reached: load 10.220000/10.000000 at i=1[12-01-2013 15:55:09] NPCD: WARN: MAX load reached: load 10.050000/10.000000 at i=0[12-01-2013 17:09:09] NPCD: WARN: MAX load reached: load 10.870000/10.000000 at i=0[12-01-2013 17:09:24] NPCD: WARN: MAX load reached: load 10.150000/10.000000 at i=1[12-02-2013 16:08:10] NPCD: WARN: MAX load reached: load 66.470000/10.000000 at i=0[12-02-2013 16:08:25] NPCD: WARN: MAX load reached: load 51.820000/10.000000 at i=1[12-02-2013 16:08:40] NPCD: WARN: MAX load reached: load 40.490000/10.000000 at i=1[12-02-2013 16:08:55] NPCD: WARN: MAX load reached: load 31.520000/10.000000 at i=1[12-02-2013 16:09:10] NPCD: WARN: MAX load reached: load 65.980000/10.000000 at i=1[12-02-2013 16:09:25] NPCD: WARN: MAX load reached: load 51.440000/10.000000 at i=1[12-02-2013 16:09:40] NPCD: WARN: MAX load reached: load 40.040000/10.000000 at i=1[12-02-2013 16:09:55] NPCD: WARN: MAX load reached: load 31.170000/10.000000 at i=1[12-02-2013 16:10:10] NPCD: WARN: MAX load reached: load 41.770000/10.000000 at i=1[12-02-2013 16:10:25] NPCD: WARN: MAX load reached: load 32.600000/10.000000 at i=1[12-02-2013 16:10:40] NPCD: WARN: MAX load reached: load 25.380000/10.000000 at i=1[12-02-2013 16:10:55] NPCD: WARN: MAX load reached: load 19.960000/10.000000 at i=1[12-02-2013 16:11:10] NPCD: WARN: MAX load reached: load 84.260000/10.000000 at i=1[12-02-2013 16:11:25] NPCD: WARN: MAX load reached: load 65.730000/10.000000 at i=1[12-02-2013 16:11:40] NPCD: WARN: MAX load reached: load 51.240000/10.000000 at i=1[12-02-2013 16:11:55] NPCD: WARN: MAX load reached: load 39.960000/10.000000 at i=1[12-02-2013 16:12:10] NPCD: WARN: MAX load reached: load 69.250000/10.000000 at i=1[12-02-2013 16:12:25] NPCD: WARN: MAX load reached: load 53.910000/10.000000 at i=1[12-02-2013 16:12:40] NPCD: WARN: MAX load reached: load 42.110000/10.000000 at i=1[12-02-2013 16:12:55] NPCD: WARN: MAX load reached: load 32.790000/10.000000 at i=1[12-02-2013 16:13:10] NPCD: WARN: MAX load reached: load 44.370000/10.000000 at i=1[12-02-2013 16:13:25] NPCD: WARN: MAX load reached: load 34.610000/10.000000 at i=1
I see that max load stuff going back to 11-07-13
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Ramdisk full!

Post by abrist »

I am sure you have seen this procedure before, but here it is:
Edit:

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:

Code: Select all

TIMEOUT = 5
To:

Code: Select all

TIMEOUT = 20
Also edit this file:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

load_threshold = 10.0
To:

Code: Select all

load_threshold = 30.0
Now restart npcd:

Code: Select all

service npcd stop
killall -9 npcd
service npcd start
Afterwards, check xidpe's file count a few times, make sure they are decreasing. If not, check your logs again.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Ramdisk full!

Post by BanditBBS »

Files still increasing and this is in the one log and nothing else new:

Code: Select all

[12-02-2013 16:21:45] NPCD: npcd Daemon (0.4.14) started with PID=5154
[12-02-2013 16:21:45] NPCD: Please have a look at 'npcd -V' to get license information
[12-02-2013 16:21:45] NPCD: HINT: load_threshold is enabled - ('3330.000000')
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Ramdisk full!

Post by BanditBBS »

Process state in Nagios shows stopped since this morning and I can't get it to start.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Ramdisk full!

Post by lmiltchev »

Do you really have "load_threshold = 30.0" in the npcd.cfg (as abrist suggested)?

Code: Select all

grep "load_threshold =" /usr/local/nagios/etc/pnp/npcd.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked