Page 1 of 1

Blank Host Groups and Services

Posted: Fri Aug 21, 2015 10:53 am
by CFT6Server
We had our Nagios XI server crashed last night. It has been locking up and giving us CPU Soft Lockup errors. After the hang up last night, looks like some hosts and services show, while others are just empty. If I go into the host detail, I can see hosts, but when I click on one of them, it just shows up with empty services.

Need some help and guidance on how to troubleshoot this and hope that this isn't some corruption in the db somewhere?

Re: Blank Host Groups and Services

Posted: Fri Aug 21, 2015 11:14 am
by lmiltchev
The db corruption is the first thing that comes to my mind. Check the mysqld.log for errors/crashed tables:

Code: Select all

tail -50 /var/log/mysqld.log
If you see issues in the log, repair the database by following the steps, outlined here:

https://assets.nagios.com/downloads/nag ... tabase.pdf

Your mysql database is not offloaded to a remote server, is it?

Re: Blank Host Groups and Services

Posted: Fri Aug 21, 2015 11:42 am
by CFT6Server
Thanks. I was tracing all the logs and didn't find anything. Decided to do a second reboot and then applied configuration and seems like everything is back to normal now. I am still trying to figure out where they hang up is. Perhaps the server is reaching capacity?

Code: Select all

# tail -50 /var/log/mysqld.log
150817  9:24:48  InnoDB: Completed initialization of buffer pool
150817  9:24:49  InnoDB: Started; log sequence number 0 44233
150817  9:24:49 [Note] Event Scheduler: Loaded 0 events
150817  9:24:49 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150820 11:26:47 [Note] /usr/libexec/mysqld: Normal shutdown

150820 11:26:47 [Note] Event Scheduler: Purging the queue. 0 events
150820 11:26:47  InnoDB: Starting shutdown...
150820 11:26:53  InnoDB: Shutdown completed; log sequence number 0 44233
150820 11:26:53 [Note] /usr/libexec/mysqld: Shutdown complete

150820 11:26:53 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150820 11:27:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150820 11:27:43  InnoDB: Initializing buffer pool, size = 8.0M
150820 11:27:43  InnoDB: Completed initialization of buffer pool
150820 11:27:43  InnoDB: Started; log sequence number 0 44233
150820 11:27:43 [Note] Event Scheduler: Loaded 0 events
150820 11:27:43 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150821  8:17:33 [Note] /usr/libexec/mysqld: Normal shutdown

150821  8:17:33 [Note] Event Scheduler: Purging the queue. 0 events
150821  8:17:33  InnoDB: Starting shutdown...
150821  8:17:37  InnoDB: Shutdown completed; log sequence number 0 44233
150821  8:17:37 [Note] /usr/libexec/mysqld: Shutdown complete

150821 08:17:37 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 08:23:09 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821  8:23:09  InnoDB: Initializing buffer pool, size = 8.0M
150821  8:23:09  InnoDB: Completed initialization of buffer pool
150821  8:23:09  InnoDB: Started; log sequence number 0 44233
150821  8:23:09 [Note] Event Scheduler: Loaded 0 events
150821  8:23:09 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150821  9:09:00 [Note] /usr/libexec/mysqld: Normal shutdown

150821  9:09:00 [Note] Event Scheduler: Purging the queue. 0 events
150821  9:09:00  InnoDB: Starting shutdown...
150821  9:09:02  InnoDB: Shutdown completed; log sequence number 0 44233
150821  9:09:02 [Note] /usr/libexec/mysqld: Shutdown complete

150821 09:09:02 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 09:09:51 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821  9:09:51  InnoDB: Initializing buffer pool, size = 8.0M
150821  9:09:51  InnoDB: Completed initialization of buffer pool
150821  9:09:51  InnoDB: Started; log sequence number 0 44233
150821  9:09:51 [Note] Event Scheduler: Loaded 0 events
150821  9:09:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

Re: Blank Host Groups and Services

Posted: Fri Aug 21, 2015 11:54 am
by lmiltchev
Can you run the following commands and show the output?

Code: Select all

top | head -5
df -h
df -i

Re: Blank Host Groups and Services

Posted: Fri Aug 21, 2015 12:17 pm
by CFT6Server

Code: Select all

top - 10:14:36 up  1:05,  1 user,  load average: 0.36, 1.00, 1.84
Tasks: 314 total,   1 running, 312 sleeping,   0 stopped,   1 zombie
Cpu(s): 18.9%us,  7.8%sy,  0.0%ni, 64.9%id,  7.5%wa,  0.1%hi,  1.0%si,  0.0%st
Mem:   5992380k total,  5512208k used,   480172k free,   273136k buffers
Swap:  2064380k total,    14064k used,  2050316k free,  3663468k cached

Code: Select all

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      286G  156G  116G  58% /
tmpfs                 2.9G     0  2.9G   0% /dev/shm
/dev/sda1             477M   66M  386M  15% /boot

Code: Select all

# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
                     18819840 540969 18278871    3% /
tmpfs                  749047      1   749046    1% /dev/shm
/dev/sda1              128016     50   127966    1% /boot
Currently looks like the load is ok. But it does get a bit heavier as the day goes

Re: Blank Host Groups and Services

Posted: Fri Aug 21, 2015 2:12 pm
by tmcdonald
Do you have multiple nagios processes fighting it out?

ps -ef | grep bin/nagios

You should see two main nagios processes at most.

Re: Blank Host Groups and Services

Posted: Tue Aug 25, 2015 9:56 am
by CFT6Server
Looks like I have multiple...

Code: Select all

# ps -ef | grep bin/nagios
nagios   12830     1  4 Aug21 ?        04:22:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   12832 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12833 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12834 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12835 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12836 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12837 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12838 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12839 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12840 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12948 12830  0 Aug21 ?        00:00:15 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     30314 20088  0 07:44 pts/0    00:00:00 grep bin/nagios
Also now that the blank groups have been fixed. Our XI box has been hanging up every few days where the performance graphs are no longer showing anything around 2am. Only way I've been able to restore that is to restart the server. I think this could be relating to this here: https://support.nagios.com/forum/viewto ... 82#p149982

Re: Blank Host Groups and Services

Posted: Tue Aug 25, 2015 1:29 pm
by ssax
Having those two is normal so that's not the problem but once you are experiencing the issue run those commands before you reboot.

Are you seeing anything about load or TIMEOUT in your /usr/local/nagios/var/npcd.log?

You are likely correct about the perfdata, see the FAQ here about the timeout and the load_threshold:

https://support.nagios.com/wiki/index.p ... ta_Timeout