Page 1 of 2

Crashed DB Table

Posted: Mon Jun 18, 2018 12:06 pm
by jameyw
I upgraded to the latest version a few weeks ago and since then I have been fighting with alarms from the localhost (Nagios XI machine) regarding Current Load. I did some searching and see that system load problems can be caused by crashed database tables. I ran a System Profile and in the logs I see that the table nagios_logentries is marked as crashed. I have followed the procedure documented here https://assets.nagios.com/downloads/nag ... tabase.pdf several times but I still have this problem.

Re: Crashed DB Table

Posted: Mon Jun 18, 2018 3:29 pm
by npolovenko
Hello, @jameyw.

What is the output of the repair_databases script when you run it as root?

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
What is the current load in the plugin output?
What is the output of the top command?
How many hosts and services are you monitoring with this XI server?
What is the alert threshold for the load check?

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 9:40 am
by jameyw
Answers threaded in below...

Is the main problem that you're receiving constant email notifications, or that the load is actually high?
I don't receive constant emails but I get a text message and email almost every night around 4:00 AM alerting me that system load is high.

What is the current load in the plugin output?
This is what was in the email from overnight; - load average: 3.81, 4.04, 3.44
This is current: - load average: 0.66, 1.21, 1.34

What is the output of the top command?

Code: Select all

top - 09:33:19 up 31 days, 16:47,  2 users,  load average: 2.96, 1.54, 1.44
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.7%us,  2.4%sy,  0.0%ni, 82.5%id,  0.2%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:   3923980k total,  2717952k used,  1206028k free,   135204k buffers
Swap:  2064380k total,        0k used,  2064380k free,  1858232k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7439 nagios    20   0 50672 2556 1036 S  5.0  0.1   0:11.13 ndo2db
 7423 nagios    20   0 26356 8040 1312 S  4.7  0.2   0:04.60 nagios
 7244 mysql     20   0 2220m  46m 6248 S  3.7  1.2   0:13.72 mysqld
  509 root      20   0     0    0    0 S  1.0  0.0 248:22.60 jbd2/dm-0-8
 4735 root      20   0 15292 1596  996 S  0.3  0.0 113:17.18 top
 7428 nagios    20   0 10108 1020  684 S  0.3  0.0   0:00.25 nagios
    1 root      20   0 19360 1248  944 S  0.0  0.0   0:14.93 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0  69:39.40 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0  25:55.81 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/0
    6 root      RT   0     0    0    0 S  0.0  0.0  37:36.04 watchdog/0
    7 root      RT   0     0    0    0 S  0.0  0.0  19:12.37 migration/1
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/1
    9 root      20   0     0    0    0 S  0.0  0.0   6:50.69 ksoftirqd/1
   10 root      RT   0     0    0    0 S  0.0  0.0  16:03.89 watchdog/1
   11 root      RT   0     0    0    0 S  0.0  0.0  46:09.38 migration/2
   12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/2
   13 root      20   0     0    0    0 S  0.0  0.0  27:17.99 ksoftirqd/2
   14 root      RT   0     0    0    0 S  0.0  0.0  35:07.52 watchdog/2
   15 root      RT   0     0    0    0 S  0.0  0.0  19:40.30 migration/3
   16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/3
   17 root      20   0     0    0    0 S  0.0  0.0   6:03.08 ksoftirqd/3
   18 root      RT   0     0    0    0 S  0.0  0.0  14:58.51 watchdog/3
   19 root      20   0     0    0    0 S  0.0  0.0 209:38.93 events/0
   20 root      20   0     0    0    0 S  0.0  0.0  56:13.19 events/1
   21 root      20   0     0    0    0 S  0.0  0.0  86:45.62 events/2
   22 root      20   0     0    0    0 S  0.0  0.0  85:00.76 events/3
   23 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/0
   24 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/1
   25 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/2
   26 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/3
   27 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/0
   28 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/1
   29 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/2
   30 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/3
   31 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   32 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   33 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   34 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   35 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup
   36 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper
   37 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns
   38 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr
   39 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm
   40 root      20   0     0    0    0 S  0.0  0.0   4:04.84 sync_supers
   41 root      20   0     0    0    0 S  0.0  0.0   4:12.17 bdi-default
   42 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/0
   43 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/1
   44 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/2
   45 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/3
   46 root      20   0     0    0    0 S  0.0  0.0 100:51.31 kblockd/0
   47 root      20   0     0    0    0 S  0.0  0.0  69:40.04 kblockd/1
   48 root      20   0     0    0    0 S  0.0  0.0 107:04.87 kblockd/2
   49 root      20   0     0    0    0 S  0.0  0.0  67:08.68 kblockd/3
   50 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpid
   51 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_notify
   52 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_hotplug
   53 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_aux
   54 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/0
   55 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/1
   56 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/2
   57 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/3
How many hosts and services are you monitoring with this XI server?
158 Hosts
1716 Services
What is the alert threshold for the load check?
Since I am not a Linux expert, I left it at the default. I assumed it would be correct.

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 9:42 am
by jameyw
I forgot...

What is the output of the repair_databases script when you run it as root?

===============
REPAIR COMPLETE
===============
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.

=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 10:44 am
by npolovenko
@ jameyw , Do you have any third party backup software that runs every night? Or an antivirus scan maybe?
Let's check the crontab:

Code: Select all

crontab -l
ls -la /etc/cron.daily/
One way to see what's going on is to run the top command at a time of the peak load.

Also, we could just increase the notification threshold to 0.4 - 0.5?

Let me know.

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 11:45 am
by jameyw
No backups at that time. No AV either. Since it is running as a VM in a VMWare cluster, I do back it up every night at 10:00 PM but it only takes about 10 minutes to do a backup so I'm pretty sure the 4:00 AM system load isn't related.

Code: Select all

[root@localhost ~]# crontab -l
no crontab for root
[root@localhost ~]# ls -la /etc/cron.daily/
total 16
drwxr-xr-x.  2 root root 4096 Feb  6  2017 .
drwxr-xr-x. 85 root root 4096 May 18 16:45 ..
-rwx------   1 root root  180 Jul  9  2003 logrotate
-rwxr-xr-x   1 root root  905 Feb 21  2013 makewhatis.cron
[root@localhost ~]#
I am going to adjust the thresholds up a little and see if that solves the probelm

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 12:01 pm
by npolovenko
@jameyw, Sounds like a plan. Do you know how to change the thresholds? In XI GUI go to the Core Configurations Manager, then Services -> find the load check service and change the values under $ARG2$ I believe. Then save and click on the Apply Configuration.

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 1:42 pm
by jameyw
Got it

Re: Crashed DB Table

Posted: Tue Jun 19, 2018 4:31 pm
by scottwilkerson
Let us know if you have further issues

Re: Crashed DB Table

Posted: Wed Jun 27, 2018 2:58 pm
by jameyw
Resolved.