Nagios XI slowness

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
RIDS_I2MP
Posts: 751
Joined: Thu Mar 13, 2014 9:25 am

Nagios XI slowness

Post by RIDS_I2MP »

Hello Team,

We are facing Nagios XI slowness issue. We have many servers and services to be configured. We are using Bulk Host Cloning and Import config wizard. Also, we are doing bulk configuration for switches and routers.

After the configuration, when we apply configuration, it takes more than usual time to complete. Even if it gets completed, the status of host, services and switches/routers takes too much of time to reflect, sometimes even 2-3 hours.

We can see them in "pending" status for many hours as well.

We are really not sure what exactly is creating such issue.

Could you please help us on priority to resolve issue? We still have many servers and services to be configured. Below are the details of Nagios server:

[root@vmaz-nagiosxi scripts]# uname -a
Linux vmaz-nagiosxi 4.18.0-193.6.3.el8_2.x86_64 #1 SMP Wed Jun 10 11:09:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

OS version: Centos 8

[root@vmaz-nagiosxi scripts]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.9G 0 7.9G 0% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs 7.9G 825M 7.1G 11% /run
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
/dev/sdb2 30G 24G 5.6G 81% /
/dev/sdb1 496M 109M 387M 22% /boot
/dev/sdb15 495M 6.8M 488M 2% /boot/efi
/dev/sdc1 32G 49M 30G 1% /mnt/resource
/dev/mapper/VG_MAIN-lv_nagios 55G 53M 52G 1% /nagiosmon
tmpfs 1.6G 0 1.6G 0% /run/user/1004
tmpfs 1.6G 0 1.6G 0% /run/user/1005
[root@vmaz-nagiosxi scripts]#

Total Number of services: 9338
Total number of hosts: 1228
RAM: 16 GB
Disk space: 128 GB

Nagios XI version: 5.7.2

Let me know if you need any information.
You do not have the required permissions to view the files attached to this post.
Thanks & Regards,
I2MP Team.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Nagios XI slowness

Post by mbellerue »

Can you send in a system profile? To get a profile, go to Admin -> System Profile -> Download Profile.

Also, if in your Nagios XI interface you can go to Home -> Monitoring Process -> Performance, and give us a screenshot of those two dashlets, that will also help.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
RIDS_I2MP
Posts: 751
Joined: Thu Mar 13, 2014 9:25 am

Re: Nagios XI slowness

Post by RIDS_I2MP »

Hello,

Thank you for your reply!!

I have provided profile file in your PM. Attaching the dashlet screenshot here.
You do not have the required permissions to view the files attached to this post.
Thanks & Regards,
I2MP Team.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Nagios XI slowness

Post by mbellerue »

Excellent, thank you! The first thing I want to try is modifying your reaper setting. If you go to Configure -> Core Config Manager -> and select the 2nd from the last item in the left menu, Core Configs.

In the General tab, search for the settings "check_result_reaper_frequency" and "max_check_result_reaper_time". Modify the values to these,

Code: Select all

check_result_reaper_frequency=3
max_check_result_reaper_time=10
Click Save Changes. Then go to Admin, under the System Component Status dashlet, click the gear icon next to Monitoring Engine, and click restart. Give your XI instance 30 minutes to an hour, and check back to see how the performance is doing.

There are some additional performance tips and tricks in the following document. But let's start with this one, and see if it helps resolve the issue before trying another.
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
RIDS_I2MP
Posts: 751
Joined: Thu Mar 13, 2014 9:25 am

Re: Nagios XI slowness

Post by RIDS_I2MP »

Hello,

Thank you for your help!!

I have made the changes and will update you again with the status.

One more thing to add here, below is the filesystem or Nagios server. We had created a separate partition called "nagiosmon" thinking we can add the files and checks generated by Nagios there in order to avoid space issues.

Please suggest if we can do something like that in order to avoid space related issues as I can see "/" 94% used.

[root@vmaz-nagiosxi var]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.9G 0 7.9G 0% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs 7.9G 790M 7.1G 10% /run
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
/dev/sdb2 30G 28G 2.0G 94% /
/dev/sdb1 496M 109M 387M 22% /boot
/dev/sdb15 495M 6.8M 488M 2% /boot/efi
/dev/sdc1 32G 49M 30G 1% /mnt/resource
/dev/mapper/VG_MAIN-lv_nagios 55G 53M 52G 1% /nagiosmon
tmpfs 1.6G 0 1.6G 0% /run/user/1004
tmpfs 1.6G 0 1.6G 0% /run/user/1005
[root@vmaz-nagiosxi var]#


Let me know if you need any additional information.
Thanks & Regards,
I2MP Team.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Nagios XI slowness

Post by mbellerue »

Is this a virtual machine? We have a document for expanding a virtual disk.
https://assets.nagios.com/downloads/nag ... M-Disk.pdf

Let me know if that will help.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
RIDS_I2MP
Posts: 751
Joined: Thu Mar 13, 2014 9:25 am

Re: Nagios XI slowness

Post by RIDS_I2MP »

Hello,

We will add the diskspace.

Now, the slowness is still the same. When I am trying to check the notifications, it keeps loading and the data is never displayed.

Could you please suggest other possible ways to avoid it.
Thanks & Regards,
I2MP Team.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios XI slowness

Post by benjaminsmith »

Hi,

Appreciate the profile, looking over the logs there are some database issues and timeouts with PHP timeouts in the Apache log. Unfortunately, the database log was not in the profile, what is the output of the following command?

Code: Select all

tail -n 100 /var/log/mysql/mysqld.log
I'd also like to check the php-fpm log, please post the output of:

Code: Select all

tail -n 50  /var/log/php-fpm/error.log
Thanks, Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
RIDS_I2MP
Posts: 751
Joined: Thu Mar 13, 2014 9:25 am

Re: Nagios XI slowness

Post by RIDS_I2MP »

Hello,

We could not find anything in mysqld.log but I can see mysqld.log-20200809 log. Below is the output:

[root@vmaz-nagiosxi ~]# cd /var/log/mysql/
[root@vmaz-nagiosxi mysql]# ls -lrt
total 48
-rw-r-----. 1 mysql mysql 5463 Jun 28 10:30 mysqld.log-20200627
-rw-r-----. 1 mysql mysql 12400 Jul 27 20:41 mysqld.log-20200629
-rw-r-----. 1 mysql mysql 11431 Aug 7 17:21 mysqld.log-20200728
-rw-r-----. 1 mysql mysql 0 Aug 9 08:32 mysqld.log
-rw-r-----. 1 mysql mysql 11419 Aug 11 04:24 mysqld.log-20200809
[root@vmaz-nagiosxi mysql]#


[root@vmaz-nagiosxi mysql]# tail -n 100 /var/log/mysql/mysqld.log-20200809
2020-08-07T13:23:59.309591Z 0 [System] [MY-010116] [Server] /usr/libexec/mysqld (mysqld 8.0.17) starting as process 944291
2020-08-07T13:24:14.410120Z 0 [System] [MY-010229] [Server] Starting crash recovery...
2020-08-07T13:24:14.552347Z 0 [System] [MY-010232] [Server] Crash recovery finished.
2020-08-07T13:24:21.035465Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2020-08-07T13:24:21.058553Z 0 [System] [MY-010931] [Server] /usr/libexec/mysqld: ready for connections. Version: '8.0.17' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution.
2020-08-07T13:24:21.100623Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/lib/mysql/mysqlx.sock' bind-address: '::' port: 33060
2020-08-07T20:57:42.648953Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T20:57:54.705309Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:07:42.725601Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:07:54.810147Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:17:42.798905Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:17:54.927167Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:27:42.873943Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:27:55.047931Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:37:42.957392Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:37:55.145690Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:47:43.054092Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:47:55.203653Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:57:43.145528Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T21:57:55.286266Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-07T22:07:43.231116Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on d2020-08-08T07:00:48.531149Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:08:01.029967Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:10:48.639232Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:18:01.085420Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:20:48.741117Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:28:01.139840Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:30:48.853939Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:38:01.195744Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:40:48.957458Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:48:01.265227Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:50:49.047489Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T07:58:01.337413Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:00:49.145384Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:08:01.392755Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:10:49.219956Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:18:01.463452Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:20:49.276479Z 23551 [ERROR] [MY-000035] [Server] Disk is full writing '/var/tmp/STfd=76' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2020-08-08T08:28:01.520063Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 se2020-08-09T04:02:23.047749Z 179226 [ERROR] [MY-011072] [Server] Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server..
04:02:23 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7f4a6c0222b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2020-08-09T04:02:23.050449Z 23551 [Warning] [MY-010754] [Server] Warning: Optimize table got errno 28 on nagios.nagios_logentries, retrying
stack_bottom = 7f4ab40a8d50 thread_stack 0x46000
/usr/libexec/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x41) [0x5637534472b1]
/usr/libexec/mysqld(handle_fatal_signal+0x333) [0x56375255a1a3]
/lib64/libpthread.so.0(+0x12dd0) [0x7f4ae5556dd0]
/lib64/libc.so.6(gsignal+0x10f) [0x7f4ae2bd370f]
/lib64/libc.so.6(abort+0x127) [0x7f4ae2bbdb25]
/usr/libexec/mysqld(+0xce37fd) [0x56375227b7fd]
/usr/libexec/mysqld(MYSQL_BIN_LOG::handle_binlog_flush_or_sync_error(THD*, bool)+0x2f0) [0x5637530fb9d0]
/usr/libexec/mysqld(MYSQL_BIN_LOG::ordered_commit(THD*, bool, bool)+0x11b) [0x56375310b42b]
/usr/libexec/mysqld(MYSQL_BIN_LOG::commit(THD*, bool)+0x391) [0x56375310c7b1]
/usr/libexec/mysqld(ha_commit_trans(THD*, bool, bool)+0x4be) [0x5637526677ee]
/usr/libexec/mysqld(trans_commit_stmt(THD*, bool)+0x40) [0x56375251c840]
/usr/libexec/mysqld(mysql_execute_command(THD*, bool)+0x3f28) [0x563752428a78]
/usr/libexec/mysqld(mysql_parse(THD*, Parser_state*)+0x363) [0x56375242a323]
/usr/libexec/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2d44) [0x56375242d574]
/usr/libexec/mysqld(do_command(THD*)+0x1bc) [0x56375242e06c]
/usr/libexec/mysqld(+0xfb37e0) [0x56375254b7e0]
/usr/libexec/mysqld(+0x23b6250) [0x56375394e250]
/lib64/libpthread.so.0(+0x82de) [0x7f4ae554c2de]
/lib64/libc.so.6(clone+0x43) [0x7f4ae2c97e83]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f4a6c299be8): is an invalid pointer
Connection ID (thread ID): 179226
Status: KILL_CONNECTION

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2020-08-09T04:05:26.061988Z 0 [System] [MY-010116] [Server] /usr/libexec/mysqld (mysqld 8.0.17) starting as process 2032570
2020-08-09T04:05:38.477433Z 0 [System] [MY-010229] [Server] Starting crash recovery...
2020-08-09T04:05:38.676084Z 0 [System] [MY-010232] [Server] Crash recovery finished.
2020-08-09T04:05:43.430373Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2020-08-09T04:05:43.457122Z 0 [System] [MY-010931] [Server] /usr/libexec/mysqld: ready for connections. Version: '8.0.17' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution.
2020-08-09T04:05:43.501213Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/lib/mysql/mysqlx.sock' bind-address: '::' port: 33060
2020-08-11T00:24:47.552450Z 144204 [Warning] [MY-012111] [InnoDB] Trying to access missing tablespace 38823
[root@vmaz-nagiosxi mysql]#


==========================



[root@vmaz-nagiosxi php-fpm]# ls -lrt
total 4348
-rw-------. 1 root root 56 Jul 12 03:46 error.log-20200719
-rw-r--r--. 1 apache apache 134 Jul 12 19:17 www-error.log-20200719
-rw-------. 1 root root 56 Jul 19 03:08 error.log-20200726
-rw-r--r--. 1 apache apache 382036 Jul 24 17:05 www-error.log-20200726
-rw-------. 1 root root 637 Jul 27 16:48 error.log-20200802
-rw-r--r--. 1 apache apache 672539 Jul 31 18:11 www-error.log-20200802
-rw-------. 1 root root 164 Aug 8 08:59 error.log-20200809
-rw-r--r--. 1 apache apache 806912 Aug 9 08:01 www-error.log-20200809
-rw-------. 1 root root 56 Aug 9 08:32 error.log
-rw-r--r--. 1 apache apache 2551809 Aug 11 21:51 www-error.log
[root@vmaz-nagiosxi php-fpm]# tail -n 50 /var/log/php-fpm/error.log
[09-Aug-2020 08:32:03] NOTICE: error log file re-opened
[root@vmaz-nagiosxi php-fpm]#
Thanks & Regards,
I2MP Team.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios XI slowness

Post by benjaminsmith »

Hi,

That's very helpful, the server got low on disk space and it caused the database to abort and possibly damage the table and that's likely why the current logs are empty.
2020-08-08T08:28:01.520063Z 23668 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000048' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 se2020-08-09T04:02:23.047749Z 179226 [ERROR] [MY-011072] [Server] Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server..
Since adding the additional disk space is the database service running? If not, can you re-start it?

Code: Select all

systemctl status mysqld
systemctl restart mysqld
If it's running, execute the following command from the shell as root to check and repair the database tables.

Code: Select all

mysqlcheck -r -f -uroot -pnagiosxi --all-databases
Then post the output of the following query to the thread and let me know if you notice any improvement.

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked