Crashed DB Table

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Crashed DB Table

Post by jameyw »

I upgraded to the latest version a few weeks ago and since then I have been fighting with alarms from the localhost (Nagios XI machine) regarding Current Load. I did some searching and see that system load problems can be caused by crashed database tables. I ran a System Profile and in the logs I see that the table nagios_logentries is marked as crashed. I have followed the procedure documented here https://assets.nagios.com/downloads/nag ... tabase.pdf several times but I still have this problem.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Crashed DB Table

Post by npolovenko »

Hello, @jameyw.

What is the output of the repair_databases script when you run it as root?

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
What is the current load in the plugin output?
What is the output of the top command?
How many hosts and services are you monitoring with this XI server?
What is the alert threshold for the load check?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: Crashed DB Table

Post by jameyw »

Answers threaded in below...

Is the main problem that you're receiving constant email notifications, or that the load is actually high?
I don't receive constant emails but I get a text message and email almost every night around 4:00 AM alerting me that system load is high.

What is the current load in the plugin output?
This is what was in the email from overnight; - load average: 3.81, 4.04, 3.44
This is current: - load average: 0.66, 1.21, 1.34

What is the output of the top command?

Code: Select all

top - 09:33:19 up 31 days, 16:47,  2 users,  load average: 2.96, 1.54, 1.44
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.7%us,  2.4%sy,  0.0%ni, 82.5%id,  0.2%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:   3923980k total,  2717952k used,  1206028k free,   135204k buffers
Swap:  2064380k total,        0k used,  2064380k free,  1858232k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7439 nagios    20   0 50672 2556 1036 S  5.0  0.1   0:11.13 ndo2db
 7423 nagios    20   0 26356 8040 1312 S  4.7  0.2   0:04.60 nagios
 7244 mysql     20   0 2220m  46m 6248 S  3.7  1.2   0:13.72 mysqld
  509 root      20   0     0    0    0 S  1.0  0.0 248:22.60 jbd2/dm-0-8
 4735 root      20   0 15292 1596  996 S  0.3  0.0 113:17.18 top
 7428 nagios    20   0 10108 1020  684 S  0.3  0.0   0:00.25 nagios
    1 root      20   0 19360 1248  944 S  0.0  0.0   0:14.93 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0  69:39.40 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0  25:55.81 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/0
    6 root      RT   0     0    0    0 S  0.0  0.0  37:36.04 watchdog/0
    7 root      RT   0     0    0    0 S  0.0  0.0  19:12.37 migration/1
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/1
    9 root      20   0     0    0    0 S  0.0  0.0   6:50.69 ksoftirqd/1
   10 root      RT   0     0    0    0 S  0.0  0.0  16:03.89 watchdog/1
   11 root      RT   0     0    0    0 S  0.0  0.0  46:09.38 migration/2
   12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/2
   13 root      20   0     0    0    0 S  0.0  0.0  27:17.99 ksoftirqd/2
   14 root      RT   0     0    0    0 S  0.0  0.0  35:07.52 watchdog/2
   15 root      RT   0     0    0    0 S  0.0  0.0  19:40.30 migration/3
   16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 stopper/3
   17 root      20   0     0    0    0 S  0.0  0.0   6:03.08 ksoftirqd/3
   18 root      RT   0     0    0    0 S  0.0  0.0  14:58.51 watchdog/3
   19 root      20   0     0    0    0 S  0.0  0.0 209:38.93 events/0
   20 root      20   0     0    0    0 S  0.0  0.0  56:13.19 events/1
   21 root      20   0     0    0    0 S  0.0  0.0  86:45.62 events/2
   22 root      20   0     0    0    0 S  0.0  0.0  85:00.76 events/3
   23 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/0
   24 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/1
   25 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/2
   26 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events/3
   27 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/0
   28 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/1
   29 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/2
   30 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_long/3
   31 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   32 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   33 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   34 root      20   0     0    0    0 S  0.0  0.0   0:00.00 events_power_ef
   35 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup
   36 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper
   37 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns
   38 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr
   39 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm
   40 root      20   0     0    0    0 S  0.0  0.0   4:04.84 sync_supers
   41 root      20   0     0    0    0 S  0.0  0.0   4:12.17 bdi-default
   42 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/0
   43 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/1
   44 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/2
   45 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kintegrityd/3
   46 root      20   0     0    0    0 S  0.0  0.0 100:51.31 kblockd/0
   47 root      20   0     0    0    0 S  0.0  0.0  69:40.04 kblockd/1
   48 root      20   0     0    0    0 S  0.0  0.0 107:04.87 kblockd/2
   49 root      20   0     0    0    0 S  0.0  0.0  67:08.68 kblockd/3
   50 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpid
   51 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_notify
   52 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kacpi_hotplug
   53 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_aux
   54 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/0
   55 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/1
   56 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/2
   57 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ata_sff/3
How many hosts and services are you monitoring with this XI server?
158 Hosts
1716 Services
What is the alert threshold for the load check?
Since I am not a Linux expert, I left it at the default. I assumed it would be correct.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: Crashed DB Table

Post by jameyw »

I forgot...

What is the output of the repair_databases script when you run it as root?

===============
REPAIR COMPLETE
===============
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.

=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Crashed DB Table

Post by npolovenko »

@ jameyw , Do you have any third party backup software that runs every night? Or an antivirus scan maybe?
Let's check the crontab:

Code: Select all

crontab -l
ls -la /etc/cron.daily/
One way to see what's going on is to run the top command at a time of the peak load.

Also, we could just increase the notification threshold to 0.4 - 0.5?

Let me know.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: Crashed DB Table

Post by jameyw »

No backups at that time. No AV either. Since it is running as a VM in a VMWare cluster, I do back it up every night at 10:00 PM but it only takes about 10 minutes to do a backup so I'm pretty sure the 4:00 AM system load isn't related.

Code: Select all

[root@localhost ~]# crontab -l
no crontab for root
[root@localhost ~]# ls -la /etc/cron.daily/
total 16
drwxr-xr-x.  2 root root 4096 Feb  6  2017 .
drwxr-xr-x. 85 root root 4096 May 18 16:45 ..
-rwx------   1 root root  180 Jul  9  2003 logrotate
-rwxr-xr-x   1 root root  905 Feb 21  2013 makewhatis.cron
[root@localhost ~]#
I am going to adjust the thresholds up a little and see if that solves the probelm
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Crashed DB Table

Post by npolovenko »

@jameyw, Sounds like a plan. Do you know how to change the thresholds? In XI GUI go to the Core Configurations Manager, then Services -> find the load check service and change the values under $ARG2$ I believe. Then save and click on the Apply Configuration.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: Crashed DB Table

Post by jameyw »

Got it
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Crashed DB Table

Post by scottwilkerson »

Let us know if you have further issues
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: Crashed DB Table

Post by jameyw »

Resolved.
Locked