We had our Nagios XI server crashed last night. It has been locking up and giving us CPU Soft Lockup errors. After the hang up last night, looks like some hosts and services show, while others are just empty. If I go into the host detail, I can see hosts, but when I click on one of them, it just shows up with empty services.
Need some help and guidance on how to troubleshoot this and hope that this isn't some corruption in the db somewhere?
Blank Host Groups and Services
Re: Blank Host Groups and Services
The db corruption is the first thing that comes to my mind. Check the mysqld.log for errors/crashed tables:
If you see issues in the log, repair the database by following the steps, outlined here:
https://assets.nagios.com/downloads/nag ... tabase.pdf
Your mysql database is not offloaded to a remote server, is it?
Code: Select all
tail -50 /var/log/mysqld.loghttps://assets.nagios.com/downloads/nag ... tabase.pdf
Your mysql database is not offloaded to a remote server, is it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Blank Host Groups and Services
Thanks. I was tracing all the logs and didn't find anything. Decided to do a second reboot and then applied configuration and seems like everything is back to normal now. I am still trying to figure out where they hang up is. Perhaps the server is reaching capacity?
Code: Select all
# tail -50 /var/log/mysqld.log
150817 9:24:48 InnoDB: Completed initialization of buffer pool
150817 9:24:49 InnoDB: Started; log sequence number 0 44233
150817 9:24:49 [Note] Event Scheduler: Loaded 0 events
150817 9:24:49 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
150820 11:26:47 [Note] /usr/libexec/mysqld: Normal shutdown
150820 11:26:47 [Note] Event Scheduler: Purging the queue. 0 events
150820 11:26:47 InnoDB: Starting shutdown...
150820 11:26:53 InnoDB: Shutdown completed; log sequence number 0 44233
150820 11:26:53 [Note] /usr/libexec/mysqld: Shutdown complete
150820 11:26:53 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150820 11:27:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150820 11:27:43 InnoDB: Initializing buffer pool, size = 8.0M
150820 11:27:43 InnoDB: Completed initialization of buffer pool
150820 11:27:43 InnoDB: Started; log sequence number 0 44233
150820 11:27:43 [Note] Event Scheduler: Loaded 0 events
150820 11:27:43 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
150821 8:17:33 [Note] /usr/libexec/mysqld: Normal shutdown
150821 8:17:33 [Note] Event Scheduler: Purging the queue. 0 events
150821 8:17:33 InnoDB: Starting shutdown...
150821 8:17:37 InnoDB: Shutdown completed; log sequence number 0 44233
150821 8:17:37 [Note] /usr/libexec/mysqld: Shutdown complete
150821 08:17:37 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 08:23:09 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821 8:23:09 InnoDB: Initializing buffer pool, size = 8.0M
150821 8:23:09 InnoDB: Completed initialization of buffer pool
150821 8:23:09 InnoDB: Started; log sequence number 0 44233
150821 8:23:09 [Note] Event Scheduler: Loaded 0 events
150821 8:23:09 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
150821 9:09:00 [Note] /usr/libexec/mysqld: Normal shutdown
150821 9:09:00 [Note] Event Scheduler: Purging the queue. 0 events
150821 9:09:00 InnoDB: Starting shutdown...
150821 9:09:02 InnoDB: Shutdown completed; log sequence number 0 44233
150821 9:09:02 [Note] /usr/libexec/mysqld: Shutdown complete
150821 09:09:02 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150821 09:09:51 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150821 9:09:51 InnoDB: Initializing buffer pool, size = 8.0M
150821 9:09:51 InnoDB: Completed initialization of buffer pool
150821 9:09:51 InnoDB: Started; log sequence number 0 44233
150821 9:09:51 [Note] Event Scheduler: Loaded 0 events
150821 9:09:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
Re: Blank Host Groups and Services
Can you run the following commands and show the output?
Code: Select all
top | head -5
df -h
df -iBe sure to check out our Knowledgebase for helpful articles and solutions!
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Blank Host Groups and Services
Code: Select all
top - 10:14:36 up 1:05, 1 user, load average: 0.36, 1.00, 1.84
Tasks: 314 total, 1 running, 312 sleeping, 0 stopped, 1 zombie
Cpu(s): 18.9%us, 7.8%sy, 0.0%ni, 64.9%id, 7.5%wa, 0.1%hi, 1.0%si, 0.0%st
Mem: 5992380k total, 5512208k used, 480172k free, 273136k buffers
Swap: 2064380k total, 14064k used, 2050316k free, 3663468k cached
Code: Select all
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
286G 156G 116G 58% /
tmpfs 2.9G 0 2.9G 0% /dev/shm
/dev/sda1 477M 66M 386M 15% /boot
Code: Select all
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
18819840 540969 18278871 3% /
tmpfs 749047 1 749046 1% /dev/shm
/dev/sda1 128016 50 127966 1% /boot
Re: Blank Host Groups and Services
Do you have multiple nagios processes fighting it out?
ps -ef | grep bin/nagios
You should see two main nagios processes at most.
ps -ef | grep bin/nagios
You should see two main nagios processes at most.
Former Nagios employee
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Blank Host Groups and Services
Looks like I have multiple...
Also now that the blank groups have been fixed. Our XI box has been hanging up every few days where the performance graphs are no longer showing anything around 2am. Only way I've been able to restore that is to restart the server. I think this could be relating to this here: https://support.nagios.com/forum/viewto ... 82#p149982
Code: Select all
# ps -ef | grep bin/nagios
nagios 12830 1 4 Aug21 ? 04:22:31 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12832 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12833 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12834 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12835 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12836 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12837 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12838 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12839 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12840 12830 0 Aug21 ? 00:00:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12948 12830 0 Aug21 ? 00:00:15 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 30314 20088 0 07:44 pts/0 00:00:00 grep bin/nagios
Re: Blank Host Groups and Services
Having those two is normal so that's not the problem but once you are experiencing the issue run those commands before you reboot.
Are you seeing anything about load or TIMEOUT in your /usr/local/nagios/var/npcd.log?
You are likely correct about the perfdata, see the FAQ here about the timeout and the load_threshold:
https://support.nagios.com/wiki/index.p ... ta_Timeout
Are you seeing anything about load or TIMEOUT in your /usr/local/nagios/var/npcd.log?
You are likely correct about the perfdata, see the FAQ here about the timeout and the load_threshold:
https://support.nagios.com/wiki/index.p ... ta_Timeout