Blank Host Groups and Services

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Blank Host Groups and Services

Post by CFT6Server »

We had our Nagios XI server crashed last night. It has been locking up and giving us CPU Soft Lockup errors. After the hang up last night, looks like some hosts and services show, while others are just empty. If I go into the host detail, I can see hosts, but when I click on one of them, it just shows up with empty services.

Need some help and guidance on how to troubleshoot this and hope that this isn't some corruption in the db somewhere?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Blank Host Groups and Services

Post by lmiltchev »

The db corruption is the first thing that comes to my mind. Check the mysqld.log for errors/crashed tables:

Code: Select all

tail -50 /var/log/mysqld.log
If you see issues in the log, repair the database by following the steps, outlined here:

https://assets.nagios.com/downloads/nag ... tabase.pdf

Your mysql database is not offloaded to a remote server, is it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Host Groups and Services

Post by CFT6Server »

Thanks. I was tracing all the logs and didn't find anything. Decided to do a second reboot and then applied configuration and seems like everything is back to normal now. I am still trying to figure out where they hang up is. Perhaps the server is reaching capacity?

Code: Select all

# tail -50 /var/log/mysqld.log
150817  9:24:48  InnoDB: Completed initialization of buffer pool
150817  9:24:49  InnoDB: Started; log sequence number 0 44233
150817  9:24:49 [Note] Event Scheduler: Loaded 0 events
150817  9:24:49 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150820 11:26:47 [Note] /usr/libexec/mysqld: Normal shutdown

150820 11:26:47 [Note] Event Scheduler: Purging the queue. 0 events
150820 11:26:47  InnoDB: Starting shutdown...
150820 11:26:53  InnoDB: Shutdown completed; log sequence number 0 44233
150820 11:26:53 [Note] /usr/libexec/mysqld: Shutdown complete

150820 11:26:53 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150820 11:27:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150820 11:27:43  InnoDB: Initializing buffer pool, size = 8.0M
150820 11:27:43  InnoDB: Completed initialization of buffer pool
150820 11:27:43  InnoDB: Started; log sequence number 0 44233
150820 11:27:43 [Note] Event Scheduler: Loaded 0 events
150820 11:27:43 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150821  8:17:33 [Note] /usr/libexec/mysqld: Normal shutdown

150821  8:17:33 [Note] Event Scheduler: Purging the queue. 0 events
150821  8:17:33  InnoDB: Starting shutdown...
150821  8:17:37  InnoDB: Shutdown completed; log sequence number 0 44233
150821  8:17:37 [Note] /usr/libexec/mysqld: Shutdown complete

150821 08:17:37 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 08:23:09 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821  8:23:09  InnoDB: Initializing buffer pool, size = 8.0M
150821  8:23:09  InnoDB: Completed initialization of buffer pool
150821  8:23:09  InnoDB: Started; log sequence number 0 44233
150821  8:23:09 [Note] Event Scheduler: Loaded 0 events
150821  8:23:09 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
150821  9:09:00 [Note] /usr/libexec/mysqld: Normal shutdown

150821  9:09:00 [Note] Event Scheduler: Purging the queue. 0 events
150821  9:09:00  InnoDB: Starting shutdown...
150821  9:09:02  InnoDB: Shutdown completed; log sequence number 0 44233
150821  9:09:02 [Note] /usr/libexec/mysqld: Shutdown complete

150821 09:09:02 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 09:09:51 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821  9:09:51  InnoDB: Initializing buffer pool, size = 8.0M
150821  9:09:51  InnoDB: Completed initialization of buffer pool
150821  9:09:51  InnoDB: Started; log sequence number 0 44233
150821  9:09:51 [Note] Event Scheduler: Loaded 0 events
150821  9:09:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Blank Host Groups and Services

Post by lmiltchev »

Can you run the following commands and show the output?

Code: Select all

top | head -5
df -h
df -i
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Host Groups and Services

Post by CFT6Server »

Code: Select all

top - 10:14:36 up  1:05,  1 user,  load average: 0.36, 1.00, 1.84
Tasks: 314 total,   1 running, 312 sleeping,   0 stopped,   1 zombie
Cpu(s): 18.9%us,  7.8%sy,  0.0%ni, 64.9%id,  7.5%wa,  0.1%hi,  1.0%si,  0.0%st
Mem:   5992380k total,  5512208k used,   480172k free,   273136k buffers
Swap:  2064380k total,    14064k used,  2050316k free,  3663468k cached

Code: Select all

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      286G  156G  116G  58% /
tmpfs                 2.9G     0  2.9G   0% /dev/shm
/dev/sda1             477M   66M  386M  15% /boot

Code: Select all

# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
                     18819840 540969 18278871    3% /
tmpfs                  749047      1   749046    1% /dev/shm
/dev/sda1              128016     50   127966    1% /boot
Currently looks like the load is ok. But it does get a bit heavier as the day goes
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Blank Host Groups and Services

Post by tmcdonald »

Do you have multiple nagios processes fighting it out?

ps -ef | grep bin/nagios

You should see two main nagios processes at most.
Former Nagios employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Blank Host Groups and Services

Post by CFT6Server »

Looks like I have multiple...

Code: Select all

# ps -ef | grep bin/nagios
nagios   12830     1  4 Aug21 ?        04:22:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   12832 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12833 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12834 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12835 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12836 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12837 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12838 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12839 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12840 12830  0 Aug21 ?        00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   12948 12830  0 Aug21 ?        00:00:15 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     30314 20088  0 07:44 pts/0    00:00:00 grep bin/nagios
Also now that the blank groups have been fixed. Our XI box has been hanging up every few days where the performance graphs are no longer showing anything around 2am. Only way I've been able to restore that is to restart the server. I think this could be relating to this here: https://support.nagios.com/forum/viewto ... 82#p149982
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Blank Host Groups and Services

Post by ssax »

Having those two is normal so that's not the problem but once you are experiencing the issue run those commands before you reboot.

Are you seeing anything about load or TIMEOUT in your /usr/local/nagios/var/npcd.log?

You are likely correct about the perfdata, see the FAQ here about the timeout and the load_threshold:

https://support.nagios.com/wiki/index.p ... ta_Timeout
Locked