Crashed DB Table
Crashed DB Table
I upgraded to the latest version a few weeks ago and since then I have been fighting with alarms from the localhost (Nagios XI machine) regarding Current Load. I did some searching and see that system load problems can be caused by crashed database tables. I ran a System Profile and in the logs I see that the table nagios_logentries is marked as crashed. I have followed the procedure documented here https://assets.nagios.com/downloads/nag ... tabase.pdf several times but I still have this problem.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Crashed DB Table
Hello, @jameyw.
What is the output of the repair_databases script when you run it as root?
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
What is the current load in the plugin output?
What is the output of the top command?
How many hosts and services are you monitoring with this XI server?
What is the alert threshold for the load check?
What is the output of the repair_databases script when you run it as root?
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.shWhat is the current load in the plugin output?
What is the output of the top command?
How many hosts and services are you monitoring with this XI server?
What is the alert threshold for the load check?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Crashed DB Table
Answers threaded in below...
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
I don't receive constant emails but I get a text message and email almost every night around 4:00 AM alerting me that system load is high.
What is the current load in the plugin output?
This is what was in the email from overnight; - load average: 3.81, 4.04, 3.44
This is current: - load average: 0.66, 1.21, 1.34
What is the output of the top command?
How many hosts and services are you monitoring with this XI server?
158 Hosts
1716 Services
What is the alert threshold for the load check?
Since I am not a Linux expert, I left it at the default. I assumed it would be correct.
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
I don't receive constant emails but I get a text message and email almost every night around 4:00 AM alerting me that system load is high.
What is the current load in the plugin output?
This is what was in the email from overnight; - load average: 3.81, 4.04, 3.44
This is current: - load average: 0.66, 1.21, 1.34
What is the output of the top command?
Code: Select all
top - 09:33:19 up 31 days, 16:47, 2 users, load average: 2.96, 1.54, 1.44
Tasks: 179 total, 1 running, 178 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 2.4%sy, 0.0%ni, 82.5%id, 0.2%wa, 0.1%hi, 0.1%si, 0.0%st
Mem: 3923980k total, 2717952k used, 1206028k free, 135204k buffers
Swap: 2064380k total, 0k used, 2064380k free, 1858232k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7439 nagios 20 0 50672 2556 1036 S 5.0 0.1 0:11.13 ndo2db
7423 nagios 20 0 26356 8040 1312 S 4.7 0.2 0:04.60 nagios
7244 mysql 20 0 2220m 46m 6248 S 3.7 1.2 0:13.72 mysqld
509 root 20 0 0 0 0 S 1.0 0.0 248:22.60 jbd2/dm-0-8
4735 root 20 0 15292 1596 996 S 0.3 0.0 113:17.18 top
7428 nagios 20 0 10108 1020 684 S 0.3 0.0 0:00.25 nagios
1 root 20 0 19360 1248 944 S 0.0 0.0 0:14.93 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 69:39.40 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 25:55.81 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/0
6 root RT 0 0 0 0 S 0.0 0.0 37:36.04 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 19:12.37 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/1
9 root 20 0 0 0 0 S 0.0 0.0 6:50.69 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 16:03.89 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 46:09.38 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/2
13 root 20 0 0 0 0 S 0.0 0.0 27:17.99 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 35:07.52 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 19:40.30 migration/3
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/3
17 root 20 0 0 0 0 S 0.0 0.0 6:03.08 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 14:58.51 watchdog/3
19 root 20 0 0 0 0 S 0.0 0.0 209:38.93 events/0
20 root 20 0 0 0 0 S 0.0 0.0 56:13.19 events/1
21 root 20 0 0 0 0 S 0.0 0.0 86:45.62 events/2
22 root 20 0 0 0 0 S 0.0 0.0 85:00.76 events/3
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/0
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/1
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/2
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/3
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/0
28 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/1
29 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/2
30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/3
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
32 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
34 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
35 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
36 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
38 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr
39 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pm
40 root 20 0 0 0 0 S 0.0 0.0 4:04.84 sync_supers
41 root 20 0 0 0 0 S 0.0 0.0 4:12.17 bdi-default
42 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/0
43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/1
44 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/2
45 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/3
46 root 20 0 0 0 0 S 0.0 0.0 100:51.31 kblockd/0
47 root 20 0 0 0 0 S 0.0 0.0 69:40.04 kblockd/1
48 root 20 0 0 0 0 S 0.0 0.0 107:04.87 kblockd/2
49 root 20 0 0 0 0 S 0.0 0.0 67:08.68 kblockd/3
50 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid
51 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
52 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_hotplug
53 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_aux
54 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/0
55 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/1
56 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/2
57 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/3
158 Hosts
1716 Services
What is the alert threshold for the load check?
Since I am not a Linux expert, I left it at the default. I assumed it would be correct.
Re: Crashed DB Table
I forgot...
What is the output of the repair_databases script when you run it as root?
===============
REPAIR COMPLETE
===============
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.
=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded
What is the output of the repair_databases script when you run it as root?
===============
REPAIR COMPLETE
===============
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.
=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Crashed DB Table
@ jameyw , Do you have any third party backup software that runs every night? Or an antivirus scan maybe?
Let's check the crontab:
One way to see what's going on is to run the top command at a time of the peak load.
Also, we could just increase the notification threshold to 0.4 - 0.5?
Let me know.
Let's check the crontab:
Code: Select all
crontab -l
ls -la /etc/cron.daily/Also, we could just increase the notification threshold to 0.4 - 0.5?
Let me know.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Crashed DB Table
No backups at that time. No AV either. Since it is running as a VM in a VMWare cluster, I do back it up every night at 10:00 PM but it only takes about 10 minutes to do a backup so I'm pretty sure the 4:00 AM system load isn't related.
I am going to adjust the thresholds up a little and see if that solves the probelm
Code: Select all
[root@localhost ~]# crontab -l
no crontab for root
[root@localhost ~]# ls -la /etc/cron.daily/
total 16
drwxr-xr-x. 2 root root 4096 Feb 6 2017 .
drwxr-xr-x. 85 root root 4096 May 18 16:45 ..
-rwx------ 1 root root 180 Jul 9 2003 logrotate
-rwxr-xr-x 1 root root 905 Feb 21 2013 makewhatis.cron
[root@localhost ~]#
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Crashed DB Table
@jameyw, Sounds like a plan. Do you know how to change the thresholds? In XI GUI go to the Core Configurations Manager, then Services -> find the load check service and change the values under $ARG2$ I believe. Then save and click on the Apply Configuration.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Crashed DB Table
Let us know if you have further issues