Nagios XI Server Broke

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
phil821
Posts: 20
Joined: Mon May 11, 2015 10:11 am

Nagios XI Server Broke

Post by phil821 »

Hello all,

We have been running a Nagios XI server probably for a little under half a year. On Friday, our server started going crazy, it started reporting that all services were offline, and its CPU usage got so high that we couldn`t even ssh into the server.

At the time, we made sure that there was no problems with the network or anything of that sort. In fact we really can`t think of any reason why anything outside of the server broke Nagios (all other servers on the network were fine)

I have made a little write up basic troubleshooting we have done. PLEASE READ THE RTF ATTACHED

Ps when we start the services, Nagios CPU spikes and pretty much becomes unusable.
You do not have the required permissions to view the files attached to this post.
Last edited by phil821 on Mon Aug 31, 2015 9:11 am, edited 1 time in total.
phil821
Posts: 20
Joined: Mon May 11, 2015 10:11 am

Re: Nagios XI Server Broke

Post by phil821 »

Here are more screen shots
You do not have the required permissions to view the files attached to this post.
phil821
Posts: 20
Joined: Mon May 11, 2015 10:11 am

Re: Nagios XI Server Broke

Post by phil821 »

And more...
You do not have the required permissions to view the files attached to this post.
phil821
Posts: 20
Joined: Mon May 11, 2015 10:11 am

Re: Nagios XI Server Broke

Post by phil821 »

Last one .....
You do not have the required permissions to view the files attached to this post.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Nagios XI Server Broke

Post by hsmith »

Okay, so it looks like you have a lot of different things going on.

First two things I would like to see:

Code: Select all

df -ih

Code: Select all

df -h
When you say you repaired the database tables, what method did you use to do this?
Former Nagios Employee.
me.
phil821
Posts: 20
Joined: Mon May 11, 2015 10:11 am

Re: Nagios XI Server Broke

Post by phil821 »

Code: Select all

[root@vfmsrv107 ~]# df -ih
Filesystem           Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
                       484K   96K  388K   20% /
tmpfs                  984K     1  984K    1% /dev/shm
/dev/sda1              126K    50  125K    1% /boot
/dev/mapper/vg_nagios_data-lv_opt
                       609K  6.3K  603K    2% /opt
[root@vfmsrv107 ~]# lvs
  LV      VG             Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_root VolGroup       -wi-ao---- 7.54g
  lv_swap VolGroup       -wi-ao---- 1.97g
  lv_opt  vg_nagios_data -wi-ao---- 9.51g
[root@vfmsrv107 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      7.3G  4.4G  2.6G  63% /
tmpfs                 3.9G     0  3.9G   0% /dev/shm
/dev/sda1             477M   66M  386M  15% /boot
/dev/mapper/vg_nagios_data-lv_opt
                      9.3G  5.9G  3.0G  67% /opt
This is when the services are off because it becomes next to impossible to SSH into when they are on.

We repaired with

Code: Select all

service mysqld stop
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
service mysqld start
If unsuccessful, then run

Code: Select all

service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_<corrupted_table>
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/local/nagiosxi/cron/dbmaint.php
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI Server Broke

Post by tgriep »

I think that an email ticket was opened for this system that I have been working on.
If it is the same system, do you want continue on with the ticket or this post?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked