Page 3 of 4

Re: NagiosXI Zombie process troubles

Posted: Fri Aug 18, 2017 4:45 pm
by tgriep
The nagiosramdisk is full as well, here is how to fix that.
Run the following as root

Code: Select all

service nagios stop
service npcd stop
service crond stop
umount /var/nagiosramdisk/
service nagios start
service npcd start
service crond start 
You can truncate the /usr/local/nagiosxi/var/sysstat.log file to free up the space.

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 9:06 am
by ejmorrow
This was done. Made no difference.

Eric

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 10:06 am
by tgriep
Can you run the following commands as root on the Nagios server and post the output?

Code: Select all

df -h
df -i
ls -l /var/nagiosramdisk/spool/
ls -l /var/nagiosramdisk/spool/checkresults/
Thanks

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 10:39 am
by ejmorrow
Well it's filled up again. Nagios is not processing checkresults so this is expected behavior.

Eric

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 12:01 pm
by tgriep
Can you run the following as root and post the output?

Code: Select all

df -h
df -i
ls -lR /var/nagiosramdisk
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
We need this information to further troubleshoot this issue.

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 12:22 pm
by Pres-Gas
Hello! I am ejmorrow's co-worker and am putting "another pair of eyes" on this. We seem to have one large issue at the moment and that is that nagios now will not start at all because of some object config file issues. So once I can get this started again, I hope to start from square one.

Here is what we are currently getting trying to start up "service nagios start":

Code: Select all

[root@esnagxiprd01 storebacknagiosxi]# service nagios start
Starting nagios:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Unexpected EOF in file '/usr/local/nagios/etc/services/esappj64.uits.iupui.edu.cfg' on line 283 - check for a missing closing bracket.
Error: Failed to locate check_period 'xi_timeperiod_24x7' for host 'absappp1.uits.iupui.edu'!
Error: Could not register host (config file '/usr/local/nagios/etc/hosts/absappp1.uits.iupui.edu.cfg', starting on line 16)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.
I then attempted to restore a configuration snapshot and the CCM hangs at Waiting for configuration verification.(..........).

Once we can get nagios to at least start, I would then run down what we are seeing.

Thanks!

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 1:10 pm
by tgriep
Try this procedure to see if you can fix the configuration error that is keeping the nagios service from starting.
Go to the Core Config Manager
Under "Tools", click "Write Config Files" or if you are running a newer versions of XI, The menu is called "Config File Management"
Click the click the "Write" button, then the "Delete" button then click the "Write" button and then the "Verify" button
If you get any errors, resolve them and click on the "Delete" button, "Write", "Verify" until all of the errors are resolved.
Click the Apply Configuration link and click the "Apply Configuration" button after all of the errors are resolved.

After this, the nagios service should be running.

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 1:46 pm
by Pres-Gas
I just tried doing that. Once the files were written out, the notification to apply configuration appeared. It is hanging on "Waiting for configuration verification."(..........).

We seem to still not be starting and it is complaining about the same files:

Code: Select all

[root@esnagxiprd01 perfdata]# service nagios start
Starting nagios:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Unexpected EOF in file '/usr/local/nagios/etc/services/esappj64.uits.iupui.edu.cfg' on line 283 - check for a missing closing bracket.
Error: Invalid max_check_attempts value for host 'localhost'
Error: Could not register host (config file '/usr/local/nagios/etc/hosts/localhost.cfg', starting on line 16)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

[root@esnagxiprd01 perfdata]# service nagios status
No lock file found in /usr/local/nagios/var/nagios.lock
What is our next step? We have the following config snapshots available to us if we have to attempt a fallback:

Code: Select all

2017-08-22 10:53:32	Config Error	1503413612.tar.gz
2017-08-21 13:06:23	Config Error	1503335183.tar.gz
2017-08-21 10:24:12	Config Ok	1503325452.tar.gz
2017-08-18 16:41:15	Config Ok	1503088875.tar.gz
2017-08-18 11:44:07	Config Ok	1503071047.tar.gz
2017-08-17 17:13:14	Config Ok	1503004394.tar.gz
2017-08-17 17:04:31	Config Ok	1503003871.tar.gz
2017-08-17 11:24:17	Config Ok	1502983457.tar.gz
2017-08-17 10:48:31	Config Ok	1502981311.tar.gz
2017-08-17 10:21:37	Config Ok	1502979697.tar.gz
2017-08-16 13:54:04	Config Ok	1502906044.tar.gz
2017-08-16 12:38:50	Config Ok	1502901530.tar.gz ***
2017-08-16 12:20:54	Config Error	1502900454.tar.gz
2017-08-14 16:06:16	Config Error	1502741176.tar.gz
2017-08-14 10:46:27	Config Error	1502721987.tar.gz
2017-08-11 16:31:04	Config Error	1502483464.tar.gz
2017-08-10 10:05:29	Config Error	1502373929.tar.gz
2017-06-20 10:26:33	Config Error	1497968793.tar.gz
2017-06-20 10:24:25	Config Error	1497968665.tar.gz
2017-06-20 10:19:23	Config Error	1497968363.tar.gz
Thanks!

Re: NagiosXI Zombie process troubles

Posted: Tue Aug 22, 2017 1:49 pm
by scottwilkerson
You were asked for this earlier but we never saw the output, please provide

Code: Select all

df -h
df -i
ls -lR /var/nagiosramdisk
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
Missing } are often caused by being out of disk space

Re: NagiosXI Zombie process troubles

Posted: Wed Aug 23, 2017 9:30 am
by ejmorrow
The ls -lR for /var/nagiosramdisk is a 441MB file that I can't attach here.

[df -h output]
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupDR-LogVolRoot
7.8G 2.8G 4.7G 37% /
tmpfs 32G 0 32G 0% /dev/shm
/dev/sda1 954M 134M 771M 15% /boot
/dev/mapper/VolGroupDR-LogVolHome
2.9G 7.7M 2.8G 1% /home
/dev/mapper/VolGroupDR-LogVolOpt
5.8G 947M 4.6G 17% /opt
/dev/mapper/VolGroupDR-LogVolTmp
976M 212M 714M 23% /tmp
/dev/mapper/VolGroupDR-LogVolUsr
3.9G 2.3G 1.5G 61% /usr
/dev/mapper/VolGroupDR-LogVolVar
4.9G 4.3G 407M 92% /var
/dev/mapper/VolGrpNagiosXI-LogVolUsrLocal
40G 4.2G 34G 12% /usr/local
/dev/mapper/VolGrpNagiosXI-LogVolVarLib
197G 30G 158G 16% /var/lib
tmpfs 512M 512M 0 100% /var/nagiosramdisk



[ df -i output ]
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroupDR-LogVolRoot
519168 20080 499088 4% /
tmpfs 8246755 1 8246754 1% /dev/shm
/dev/sda1 63104 54 63050 1% /boot
/dev/mapper/VolGroupDR-LogVolHome
194688 221 194467 1% /home
/dev/mapper/VolGroupDR-LogVolOpt
393216 2222 390994 1% /opt
/dev/mapper/VolGroupDR-LogVolTmp
65536 13772 51764 22% /tmp
/dev/mapper/VolGroupDR-LogVolUsr
262144 82106 180038 32% /usr
/dev/mapper/VolGroupDR-LogVolVar
327680 327680 0 100% /var
/dev/mapper/VolGrpNagiosXI-LogVolUsrLocal
2621440 1296572 1324868 50% /usr/local
/dev/mapper/VolGrpNagiosXI-LogVolVarLib
13107200 10030 13097170 1% /var/lib
tmpfs 8246755 8246755 0 100% /var/nagiosramdisk



[ ls /var/nagiosramdisk/spool/xidpe | wc -l output ]
0

[ ls /var/nagiosramdisk/spool/perfdata | wc -l output ]
0

[ ls /var/nagiosramdisk/spool/ | wc -l output ]
8246749