Fusion database lock issue

hbouma · Post by **hbouma** » Tue Oct 30, 2018 6:47 am

Overnight, we had our Fusion database get into a locked state and require a cycle. For reference, we have had multiple issues with fusion databases and stability in the past (https://support.nagios.com/forum/viewto ... 17&t=49945) and (https://support.nagios.com/forum/viewto ... 17&t=50159). I am attaching the mariadb.log file and hoping that someone will be able to help us diagnose the issue and prevent it from happening again.

We are running Fusion 4.1.5 on Red Hat 7 64bit VM's. 8 cores and 16GB of RAM.

my.cnf file is as follows:

Code: Select all

[mysqld]
max_connections=818
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd

thread_cache_size = 16

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
max_allowed_packet = 32M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
query_cache_size = 6M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
query_cache_limit = 4M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
tmp_table_size = 64M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
max_heap_table_size = 64M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
key_buffer_size = 32M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
table_open_cache = 32

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
innodb_file_per_table = 1

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
innodb_log_buffer_size = 32M

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
innodb_buffer_pool_size = 6G

# Added by Nagios 2018-08-27 09:20:12 -0400 EDT
innodb_log_file_size = 256M

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

ssax · Post by **ssax** » Tue Oct 30, 2018 2:15 pm

After reviewing multiple pages it seems like this may be related to the innodb_adaptive_hash_index being set to on.

Please try setting innodb_adaptive_hash_index=0 in your /etc/my.cnf and restart the mariadb service:

Code: Select all

service mariadb restart

https://stackoverflow.com/a/24910831

ssax · Post by **ssax** » Tue Oct 30, 2018 2:19 pm

In addition, are you seeing any semaphore errors in /var/log/messages?

hbouma · Post by **hbouma** » Tue Oct 30, 2018 2:44 pm

I don't see anything about semaphores, but I do see that we apparently ran out of swap space, which is odd because we have 16GB of RAM, and 8GB of swap. We usually run with about 8GB of the RAM free.

Code: Select all

KiB Mem : 16266712 total,  7960080 free,  1762268 used,  6544364 buff/cache
KiB Swap:  8388600 total,  7880432 free,   508168 used. 14066716 avail Mem

ssax · Post by **ssax** » Tue Oct 30, 2018 3:37 pm

With the DB locking preventing the queries from finishing I could see the poll_subsys.php spooling up multiple copies because they were not finishing, let us know if the innodb_adaptive_hash_index=0 in your /etc/my.cnf resolves your issue.

Thank you

hbouma · Post by **hbouma** » Mon Nov 05, 2018 7:34 am

Unfortunately, the issue occurred again this weekend. Same issue of running out of memory.

Post by **tgriep** » Mon Nov 05, 2018 3:09 pm

Do you know which process was using up the memory?
Can you get the /var/log/messages file from when it was failing and post that here so we can check it for any errors?
And, get this file from the Fusion server and post it as well so we can check it's settings.

Code: Select all

/etc/sysctl.conf

Thanks

hbouma · Post by **hbouma** » Mon Nov 05, 2018 3:19 pm

PM sent with logs.

Post by **tgriep** » Mon Nov 05, 2018 3:53 pm

If looks like the poll_subsys.php script had some sort of problem and kept on running multiple copies until they used up all of the memory so I will need to see the following files from yesterday.

Code: Select all

/var/log/cron
/usr/local/nagiosfusion/var/log/poll_subsys.log

Can you run the following as root and post the /tmp/info.txt file.

Code: Select all

echo 'select * from servers;' |mysql -t -u fusion -pfusion fusion >/tmp/info.txt
echo 'select * from sysstat;' |mysql -t -u fusion -pfusion fusion >>/tmp/info.txt
echo 'select * from polled_averages;' |mysql -t -u fusion -pfusion fusion >>/tmp/info.txt
echo 'select * from polled_deltas;' |mysql -t -u fusion -pfusion fusion >>/tmp/info.txt
echo 'select * from polling_lock;' |mysql -t -u fusion -pfusion fusion >>/tmp/info.txt
echo 'select * from options;' |mysql -t -u fusion -pfusion fusion >>/tmp/info.txt

Post by **tgriep** » Mon Nov 05, 2018 5:10 pm

The cron file was only for today so you will have to get one of the older archived .gz files and post it here.

Also, get this file as well. If any of them do not show any data for today, get the archived copy instead.

Code: Select all

/usr/local/nagiosfusion/var/log/dberrors.log

And any log file from this folder for yesterday added to the post would help as well.

Code: Select all

/usr/local/nagiosfusion/var/log

Nagios Support Forum

Fusion database lock issue

Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue

Re: Fusion database lock issue