NagiosXI Zombie process troubles

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NagiosXI Zombie process troubles

Post by tgriep »

The nagiosramdisk is full as well, here is how to fix that.
Run the following as root

Code: Select all

service nagios stop
service npcd stop
service crond stop
umount /var/nagiosramdisk/
service nagios start
service npcd start
service crond start 
You can truncate the /usr/local/nagiosxi/var/sysstat.log file to free up the space.
Be sure to check out our Knowledgebase for helpful articles and solutions!
ejmorrow
Posts: 13
Joined: Fri May 13, 2016 9:02 am

Re: NagiosXI Zombie process troubles

Post by ejmorrow »

This was done. Made no difference.

Eric
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NagiosXI Zombie process troubles

Post by tgriep »

Can you run the following commands as root on the Nagios server and post the output?

Code: Select all

df -h
df -i
ls -l /var/nagiosramdisk/spool/
ls -l /var/nagiosramdisk/spool/checkresults/
Thanks
Be sure to check out our Knowledgebase for helpful articles and solutions!
ejmorrow
Posts: 13
Joined: Fri May 13, 2016 9:02 am

Re: NagiosXI Zombie process troubles

Post by ejmorrow »

Well it's filled up again. Nagios is not processing checkresults so this is expected behavior.

Eric
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NagiosXI Zombie process troubles

Post by tgriep »

Can you run the following as root and post the output?

Code: Select all

df -h
df -i
ls -lR /var/nagiosramdisk
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
We need this information to further troubleshoot this issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Pres-Gas
Posts: 52
Joined: Thu Mar 22, 2012 12:09 pm

Re: NagiosXI Zombie process troubles

Post by Pres-Gas »

Hello! I am ejmorrow's co-worker and am putting "another pair of eyes" on this. We seem to have one large issue at the moment and that is that nagios now will not start at all because of some object config file issues. So once I can get this started again, I hope to start from square one.

Here is what we are currently getting trying to start up "service nagios start":

Code: Select all

[root@esnagxiprd01 storebacknagiosxi]# service nagios start
Starting nagios:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Unexpected EOF in file '/usr/local/nagios/etc/services/esappj64.uits.iupui.edu.cfg' on line 283 - check for a missing closing bracket.
Error: Failed to locate check_period 'xi_timeperiod_24x7' for host 'absappp1.uits.iupui.edu'!
Error: Could not register host (config file '/usr/local/nagios/etc/hosts/absappp1.uits.iupui.edu.cfg', starting on line 16)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.
I then attempted to restore a configuration snapshot and the CCM hangs at Waiting for configuration verification.(..........).

Once we can get nagios to at least start, I would then run down what we are seeing.

Thanks!
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NagiosXI Zombie process troubles

Post by tgriep »

Try this procedure to see if you can fix the configuration error that is keeping the nagios service from starting.
Go to the Core Config Manager
Under "Tools", click "Write Config Files" or if you are running a newer versions of XI, The menu is called "Config File Management"
Click the click the "Write" button, then the "Delete" button then click the "Write" button and then the "Verify" button
If you get any errors, resolve them and click on the "Delete" button, "Write", "Verify" until all of the errors are resolved.
Click the Apply Configuration link and click the "Apply Configuration" button after all of the errors are resolved.

After this, the nagios service should be running.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Pres-Gas
Posts: 52
Joined: Thu Mar 22, 2012 12:09 pm

Re: NagiosXI Zombie process troubles

Post by Pres-Gas »

I just tried doing that. Once the files were written out, the notification to apply configuration appeared. It is hanging on "Waiting for configuration verification."(..........).

We seem to still not be starting and it is complaining about the same files:

Code: Select all

[root@esnagxiprd01 perfdata]# service nagios start
Starting nagios:
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
Error: Unexpected EOF in file '/usr/local/nagios/etc/services/esappj64.uits.iupui.edu.cfg' on line 283 - check for a missing closing bracket.
Error: Invalid max_check_attempts value for host 'localhost'
Error: Could not register host (config file '/usr/local/nagios/etc/hosts/localhost.cfg', starting on line 16)
   Error processing object config files!


***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

[root@esnagxiprd01 perfdata]# service nagios status
No lock file found in /usr/local/nagios/var/nagios.lock
What is our next step? We have the following config snapshots available to us if we have to attempt a fallback:

Code: Select all

2017-08-22 10:53:32	Config Error	1503413612.tar.gz
2017-08-21 13:06:23	Config Error	1503335183.tar.gz
2017-08-21 10:24:12	Config Ok	1503325452.tar.gz
2017-08-18 16:41:15	Config Ok	1503088875.tar.gz
2017-08-18 11:44:07	Config Ok	1503071047.tar.gz
2017-08-17 17:13:14	Config Ok	1503004394.tar.gz
2017-08-17 17:04:31	Config Ok	1503003871.tar.gz
2017-08-17 11:24:17	Config Ok	1502983457.tar.gz
2017-08-17 10:48:31	Config Ok	1502981311.tar.gz
2017-08-17 10:21:37	Config Ok	1502979697.tar.gz
2017-08-16 13:54:04	Config Ok	1502906044.tar.gz
2017-08-16 12:38:50	Config Ok	1502901530.tar.gz ***
2017-08-16 12:20:54	Config Error	1502900454.tar.gz
2017-08-14 16:06:16	Config Error	1502741176.tar.gz
2017-08-14 10:46:27	Config Error	1502721987.tar.gz
2017-08-11 16:31:04	Config Error	1502483464.tar.gz
2017-08-10 10:05:29	Config Error	1502373929.tar.gz
2017-06-20 10:26:33	Config Error	1497968793.tar.gz
2017-06-20 10:24:25	Config Error	1497968665.tar.gz
2017-06-20 10:19:23	Config Error	1497968363.tar.gz
Thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagiosXI Zombie process troubles

Post by scottwilkerson »

You were asked for this earlier but we never saw the output, please provide

Code: Select all

df -h
df -i
ls -lR /var/nagiosramdisk
ls /var/nagiosramdisk/spool/xidpe | wc -l
ls /var/nagiosramdisk/spool/perfdata/ | wc -l
ls /var/nagiosramdisk/spool/checkresults/ | wc -l
Missing } are often caused by being out of disk space
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
ejmorrow
Posts: 13
Joined: Fri May 13, 2016 9:02 am

Re: NagiosXI Zombie process troubles

Post by ejmorrow »

The ls -lR for /var/nagiosramdisk is a 441MB file that I can't attach here.

[df -h output]
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupDR-LogVolRoot
7.8G 2.8G 4.7G 37% /
tmpfs 32G 0 32G 0% /dev/shm
/dev/sda1 954M 134M 771M 15% /boot
/dev/mapper/VolGroupDR-LogVolHome
2.9G 7.7M 2.8G 1% /home
/dev/mapper/VolGroupDR-LogVolOpt
5.8G 947M 4.6G 17% /opt
/dev/mapper/VolGroupDR-LogVolTmp
976M 212M 714M 23% /tmp
/dev/mapper/VolGroupDR-LogVolUsr
3.9G 2.3G 1.5G 61% /usr
/dev/mapper/VolGroupDR-LogVolVar
4.9G 4.3G 407M 92% /var
/dev/mapper/VolGrpNagiosXI-LogVolUsrLocal
40G 4.2G 34G 12% /usr/local
/dev/mapper/VolGrpNagiosXI-LogVolVarLib
197G 30G 158G 16% /var/lib
tmpfs 512M 512M 0 100% /var/nagiosramdisk



[ df -i output ]
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroupDR-LogVolRoot
519168 20080 499088 4% /
tmpfs 8246755 1 8246754 1% /dev/shm
/dev/sda1 63104 54 63050 1% /boot
/dev/mapper/VolGroupDR-LogVolHome
194688 221 194467 1% /home
/dev/mapper/VolGroupDR-LogVolOpt
393216 2222 390994 1% /opt
/dev/mapper/VolGroupDR-LogVolTmp
65536 13772 51764 22% /tmp
/dev/mapper/VolGroupDR-LogVolUsr
262144 82106 180038 32% /usr
/dev/mapper/VolGroupDR-LogVolVar
327680 327680 0 100% /var
/dev/mapper/VolGrpNagiosXI-LogVolUsrLocal
2621440 1296572 1324868 50% /usr/local
/dev/mapper/VolGrpNagiosXI-LogVolVarLib
13107200 10030 13097170 1% /var/lib
tmpfs 8246755 8246755 0 100% /var/nagiosramdisk



[ ls /var/nagiosramdisk/spool/xidpe | wc -l output ]
0

[ ls /var/nagiosramdisk/spool/perfdata | wc -l output ]
0

[ ls /var/nagiosramdisk/spool/ | wc -l output ]
8246749
Locked