Page 1 of 5

Ram disk and config errors

Posted: Wed Oct 08, 2014 11:51 am
by jbennett
I'm struggling with an issue with my ramdisk filling up.

Currently I utilize templates for my hosts and services. I have asigned alert contact groups in those templates rather than in each host and service.

This was working fine until yesterday, where I needed to test out creating a second contact group and assigning that contact group only a certain set of hosts, but all services assigned to those hosts. I removed the contact groups from the templates, saved and applied the changes. I then went to the bulk modification tool and added contact groups to select hosts and all services. I had contact group A assigned to a set of hosts and contact group B assigned to another set of hosts.

I tested it out and it appears to work fine.

I then needed to revert from the testing so I decided to just add contact group A to all hosts and remove contact group B from their assigned hosts. Keeping in mind that we have 1500 something hosts and 22000 service checks, this appeared to have caused the DB to crash. I also ran out of inodes in /var. I repaired the DB, and cleared out the inodes (from all of the checks that backed up I suppose).

I then started Nagios back up and foudn that the ramdisk was filling up. When I ran the check config script, I see the following:

Code: Select all

Warning: Host 'xxxx' has no default contacts or contactgroups defined!
Yet, I check the host and the corresponding template that's assigned to it and I see my contact groups.

What am I missing?

I'm on Nagios XI 2014R1.5.

EDIT: I also have the following when I check configs:

Code: Select all

Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 40) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 285) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 314) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 64) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 103) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 368) 
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 401) 
Read object config files okay... 

Re: Ram disk and config errors

Posted: Wed Oct 08, 2014 12:19 pm
by lmiltchev
Run the following commands and show us the output:

Code: Select all

uptime
service npcd status
df -h
df -i
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
Note: Modify the path to "nagiosramdisk" if it is in a different location.

Re: Ram disk and config errors

Posted: Wed Oct 08, 2014 12:24 pm
by jbennett

Code: Select all

[xxx]# uptime
 12:21:21 up  1:24,  1 user,  load average: 2.24, 2.02, 1.64
[xxx]# /etc/init.d/npcd status
NPCD running (pid 3469).
[xxx]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00_ROOT
                       48G   36G  8.9G  81% /
/dev/mapper/VolGroup00-LogVol00
                      3.0G  255M  2.6G   9% /tmp
/dev/mapper/VolGroup00-LogVol00_VAR
                      5.7G  4.7G  750M  87% /var
/dev/hda1             190M   40M  141M  23% /boot
tmpfs                 5.9G     0  5.9G   0% /dev/shm
tmpfs                 125M  125M     0 100% /var/nagiosramdisk
10.100.3.220:/kickstart
                      190G  130G   51G  73% /kickstart
[xxx]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00_ROOT
                     12799776  218098 12581678    2% /
/dev/mapper/VolGroup00-LogVol00
                      793600    6192  787408    1% /tmp
/dev/mapper/VolGroup00-LogVol00_VAR
                     1540096   90296 1449800    6% /var
/dev/hda1              50200      50   50150    1% /boot
tmpfs                1538707       1 1538706    1% /dev/shm
tmpfs                1538707  801816  736891   53% /var/nagiosramdisk
xx.xx.xx.xx:/kickstart
                     51216384  252260 50964124    1% /kickstart
[xxx]# ls /var/nagiosramdisk/spool/xidpe | wc -l
0
[xxx]# ls /var/nagiosramdisk/spool/perfdata/ | wc -l
78
[xxx]# ls /var/nagiosramdisk/spool/checkresults/ | wc -l
806626

Re: Ram disk and config errors

Posted: Wed Oct 08, 2014 1:51 pm
by jbennett
From another thread I ran the following:

Code: Select all

[xxx checkresults]$ sudo tail -25 /usr/local/nagios/var/perfdata.log
2014-10-08 13:01:21 [20827] [0] *** process_perfdata.pl terminated on signal ALRM
2014-10-08 13:01:53 [21268] [0] *** TIMEOUT: Timeout after 15 secs. ***
2014-10-08 13:01:53 [21268] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-10-08 13:01:53 [21268] [0] *** TIMEOUT: Please check your npcd.cfg
2014-10-08 13:01:53 [21268] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791283-PID-21268 deleted
2014-10-08 13:01:53 [21268] [0] *** Timeout while processing Host: "yyy" Service: "zzz"
2014-10-08 13:01:53 [21268] [0] *** process_perfdata.pl terminated on signal ALRM
2014-10-08 13:01:54 [21267] [0] *** TIMEOUT: Timeout after 15 secs. ***
2014-10-08 13:01:54 [21267] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-10-08 13:01:54 [21267] [0] *** TIMEOUT: Please check your npcd.cfg
2014-10-08 13:01:54 [21267] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791267-PID-21267 deleted
2014-10-08 13:01:54 [21267] [0] *** Timeout while processing Host: "yyy" Service: "zzz"
2014-10-08 13:01:54 [21267] [0] *** process_perfdata.pl terminated on signal ALRM
2014-10-08 13:02:25 [21637] [0] *** TIMEOUT: Timeout after 15 secs. ***
2014-10-08 13:02:25 [21637] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-10-08 13:02:25 [21637] [0] *** TIMEOUT: Please check your npcd.cfg
2014-10-08 13:02:25 [21637] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791297-PID-21637 deleted
2014-10-08 13:02:25 [21637] [0] *** Timeout while processing Host: "yyy" Service: "zzz"
2014-10-08 13:02:25 [21637] [0] *** process_perfdata.pl terminated on signal ALRM
2014-10-08 13:04:30 [23012] [0] *** TIMEOUT: Timeout after 15 secs. ***
2014-10-08 13:04:30 [23012] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2014-10-08 13:04:30 [23012] [0] *** TIMEOUT: Please check your npcd.cfg
2014-10-08 13:04:30 [23012] [0] *** TIMEOUT: /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791447-PID-23012 deleted
2014-10-08 13:04:30 [23012] [0] *** Timeout while processing Host: "yyy" Service: "zzz"
2014-10-08 13:04:30 [23012] [0] *** process_perfdata.pl terminated on signal ALRM
[xxx checkresults]$ tail -25 /usr/local/nagios/var/npcd.log
[10-08-2014 12:46:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790369'
[10-08-2014 12:46:54] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:46:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//host-perfdata.1412790369'
[10-08-2014 12:49:15] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:49:15] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790518'
[10-08-2014 12:49:45] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:49:45] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790532'
[10-08-2014 12:51:42] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:51:42] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790668'
[10-08-2014 12:52:54] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:52:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790742'
[10-08-2014 12:55:25] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 12:55:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412790894'
[10-08-2014 13:01:21] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:01:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791253'
[10-08-2014 13:01:21] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:01:21] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791237'
[10-08-2014 13:01:53] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:01:53] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791283'
[10-08-2014 13:01:54] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:01:54] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791267'
[10-08-2014 13:02:25] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:02:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791297'
[10-08-2014 13:04:30] NPCD: ERROR: Executed command exits with return code '7'
[10-08-2014 13:04:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//service-perfdata.1412791447'

Re: Ram disk and config errors

Posted: Wed Oct 08, 2014 3:25 pm
by jbennett
I think I may have found the answer.

A day ago I added sbin to my $PATH, as suggested here: http://support.nagios.com/forum/viewtop ... 16&t=29427

I started retracing all of my steps and removed :/sbin from my profile's path and lo and behild, my ramdisk isn't filling up any more.

However, I still have the warnings in my configs.

Re: Ram disk and config errors

Posted: Wed Oct 08, 2014 5:08 pm
by Box293
Please post the warnings that are still appearing.

Re: Ram disk and config errors

Posted: Thu Oct 09, 2014 7:44 am
by jbennett

Code: Select all

Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 40)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 285)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in host type objects (config file '/usr/local/nagios/etc/hosttemplates.cfg', starting at line 314)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 64)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 103)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 368)
Warning: failure_prediction_enabled is obsoleted and no longer has any effect in service type objects (config file '/usr/local/nagios/etc/servicetemplates.cfg', starting at line 401)

Re: Ram disk and config errors

Posted: Thu Oct 09, 2014 4:54 pm
by Box293
You have some configs in your templates that need "failure_prediction_enabled" removed from them.

Re: Ram disk and config errors

Posted: Fri Oct 10, 2014 10:43 am
by jbennett
ok - I see where that config was removed from definitions as part of core 4.0.

I'm checking host templates and I don't see this anywhere.

Where do i have to remove it?

Re: Ram disk and config errors

Posted: Fri Oct 10, 2014 3:51 pm
by Box293
You're right ... I was talking to a dev and these should have been removed when upgraded to 2014 so you won't be able to do this in the ccm.

However try this:

Go into CCM
Tools > Write Config Files
Click the Write Button
It will show an output of all the files it creates
Click the Verify button
The output should end with "Total Errors: 0"
Quick Tools > Apply Configuration
Click the Apply Configuration button

Now do these warning messages still appear?