Page 1 of 1

Nagios XI Server Broke

Posted: Mon Aug 31, 2015 8:56 am
by phil821
Hello all,

We have been running a Nagios XI server probably for a little under half a year. On Friday, our server started going crazy, it started reporting that all services were offline, and its CPU usage got so high that we couldn`t even ssh into the server.

At the time, we made sure that there was no problems with the network or anything of that sort. In fact we really can`t think of any reason why anything outside of the server broke Nagios (all other servers on the network were fine)

I have made a little write up basic troubleshooting we have done. PLEASE READ THE RTF ATTACHED

Ps when we start the services, Nagios CPU spikes and pretty much becomes unusable.

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 9:04 am
by phil821
Here are more screen shots

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 9:06 am
by phil821
And more...

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 9:07 am
by phil821
Last one .....

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 9:21 am
by hsmith
Okay, so it looks like you have a lot of different things going on.

First two things I would like to see:

Code: Select all

df -ih

Code: Select all

df -h
When you say you repaired the database tables, what method did you use to do this?

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 10:18 am
by phil821

Code: Select all

[root@vfmsrv107 ~]# df -ih
Filesystem           Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
                       484K   96K  388K   20% /
tmpfs                  984K     1  984K    1% /dev/shm
/dev/sda1              126K    50  125K    1% /boot
/dev/mapper/vg_nagios_data-lv_opt
                       609K  6.3K  603K    2% /opt
[root@vfmsrv107 ~]# lvs
  LV      VG             Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_root VolGroup       -wi-ao---- 7.54g
  lv_swap VolGroup       -wi-ao---- 1.97g
  lv_opt  vg_nagios_data -wi-ao---- 9.51g
[root@vfmsrv107 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      7.3G  4.4G  2.6G  63% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/sda1             477M   66M  386M  15% /boot
/dev/mapper/vg_nagios_data-lv_opt
                      9.3G  5.9G  3.0G  67% /opt
This is when the services are off because it becomes next to impossible to SSH into when they are on.

We repaired with

Code: Select all

service mysqld stop
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
service mysqld start
If unsuccessful, then run

Code: Select all

service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/local/nagiosxi/cron/dbmaint.php

Re: Nagios XI Server Broke

Posted: Mon Aug 31, 2015 12:15 pm
by tgriep
I think that an email ticket was opened for this system that I have been working on.
If it is the same system, do you want continue on with the ticket or this post?