Page 2 of 3
Re: Performance Issue on Nagios XI server
Posted: Wed Sep 09, 2015 10:37 am
by tthomas
Hi tgriep,
Thank you for the reply.
We have been noticing the below error in our nagios server.
Sep 9 17:31:23 ndo2db: Message sent to queue
Sep 9 17:31:23 ndo2db: Error: queue send error, retrying...
Could you please tell what is wrong here.
Re: Performance Issue on Nagios XI server
Posted: Wed Sep 09, 2015 12:53 pm
by lmiltchev
Did you notice this error after truncating the tables? Have you run the database repair script after truncating the tables? Do you have any new errors in the mysqld.log?
The issue could be related to kernel tuning. For more info, see this post:
https://support.nagios.com/wiki/index.p ... 3.x_Issues
What is the output of the following command?
Re: Performance Issue on Nagios XI server
Posted: Thu Sep 10, 2015 1:56 am
by tthomas
Hi
We have not performed the repair activity yet. We will be doing it in the near future.
Kernel parameters are already set with correct values.
Code: Select all
[root@ ~]# grep msgmnb /etc/sysctl.conf
kernel.msgmnb = 131072000
[root@ ~]# grep msgmax /etc/sysctl.conf
kernel.msgmax = 131072000
[root@ ~]# grep shmmax /etc/sysctl.conf
kernel.shmmax = 68719476736
[root@ ~]# grep shmall /etc/sysctl.conf
kernel.shmall = 4294967296
[root@ ~]# grep msgmni /etc/sysctl.conf
kernel.msgmni = 256000
Please find the system details.
Code: Select all
Nagios XI 2012R1.2
Linux 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Please see the mysqld log attached.
mysqld_log.txt
Re: Performance Issue on Nagios XI server
Posted: Thu Sep 10, 2015 9:26 am
by lmiltchev
Let us know if truncating the tables and running the database repair script fixed your issue. Check the log again AFTER running the repair script just in case.
Re: Performance Issue on Nagios XI server
Posted: Thu Sep 24, 2015 9:46 am
by tthomas
Hi
We did it on a test nagios XI instance. We have repaired a 1GB database and truncated the logentries and statehistory tables.
Eventhough we did not find any improvement on server load or GUI speed we are going to do the same on our production Nagios XI instance this weekend.
I have a question regarding truncating logentries and statehistory tables. How it will be reflected in GUI? What data won't be available in GUI?
Regards
Tino
Re: Performance Issue on Nagios XI server
Posted: Thu Sep 24, 2015 11:54 am
by tgriep
Truncating those tables will affect some of the reports for historical data.
The reports that will be affected are State History, Event Log, Availability and parts of the Exec Summary report.
Re: Performance Issue on Nagios XI server
Posted: Mon Sep 28, 2015 9:31 am
by tthomas
Hi
We performed the repair activity on production Nagios XI instance. Still we are not able to find much improvement.
Mysql_Repair_Log.txt
Code: Select all
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6548 mysql 20 0 8362m 1.1g 3500 S 338.5 3.7 4556:17 /usr/libexec/mysqld --basedir=/usr --datadir=/u01/appl/mysql --user=mysql --log-error=/var/log/mysqld.log
24882 apache 20 0 651m 33m 9604 R 64.2 0.1 19:01.39 /usr/sbin/httpd
858 apache 20 0 702m 83m 8740 S 42.4 0.3 6:04.95 /usr/sbin/httpd
30968 apache 20 0 608m 85m 9268 S 41.0 0.3 3:39.17 /usr/sbin/httpd
15168 apache 20 0 600m 78m 9260 R 24.8 0.2 2:59.79 /usr/sbin/httpd
18488 apache 20 0 619m 95m 8520 S 23.8 0.3 1:48.22 /usr/sbin/httpd
32733 apache 20 0 718m 101m 9568 R 7.9 0.3 18:59.36 /usr/sbin/httpd
18747 apache 20 0 583m 61m 9280 R 5.3 0.2 1:15.72 /usr/sbin/httpd
Still we can see high load on the server and applying configuration fails sometimes. All the information about the server is provided in earlier posts.
Should we tweak apache as well?
Regards
Tino
Re: Performance Issue on Nagios XI server
Posted: Mon Sep 28, 2015 10:34 am
by ssax
Are you still getting errors in your /var/log/mysqld.log? Please post the latest copy.
From the original log you uploaded:
Code: Select all
150909 16:05:51 [ERROR] /usr/libexec/mysqld: Incorrect key file for table '/tmp/#sql_bc3_4.MYI'; try to repair it
150909 16:05:49 [ERROR] /usr/libexec/mysqld: Sort aborted
These errors could be related to a lack of disk space, you only have 3.8GB available and the temp files that mysql creates can get quite large, my recommendation is for you to clean up more free space.
Re: Performance Issue on Nagios XI server
Posted: Tue Sep 29, 2015 2:44 am
by tthomas
Hi,
This error last appeared on 150927 4:37:53
Code: Select all
[root@tmp]# tail /var/log/mysqld.log
150927 4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927 4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927 4:37:34 [ERROR] /usr/libexec/mysqld: Sort aborted
150927 4:37:53 [ERROR] /usr/libexec/mysqld: Sort aborted
I can see that the following directory is getting filled fast with some xml logs. Are they related to nagios?
Code: Select all
[root@host_3954948670_80]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80
[root@host_3954948670_80]# du -sh *
5.3G alert
4.0K cdump
4.0K incident
4.0K incpkg
4.0K lck
260K metadata
4.0K metadata_dgif
4.0K metadata_pv
4.0K stage
4.0K sweep
1.4G trace
Code: Select all
[root@alert]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80/alert
-rw-rw---- 1 nagios users 10486797 Sep 26 09:33 log_537.xml
-rw-rw---- 1 nagios users 10486395 Sep 28 13:59 log_538.xml
-rw-rw---- 1 nagios users 10485770 Sep 29 06:46 log_539.xml
-rw-rw---- 1 nagios users 1604457 Sep 29 09:23 log.xml
11M log_97.xml
11M log_98.xml
11M log_99.xml
11M log_9.xml
These files getting filled with the following content repeatedly
Code: Select all
<msg time='2015-09-29T06:46:24.630+02:00' org_id='oracle' comp_id='clients'
type='UNKNOWN' level='16' host_id='NAGIOSHOST'
host_addr='NAGIOS_IP'>
<txt>Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]
</txt>
</msg>
Can we delete these old xml log files to clear up some space on / ?
Regards
Tino
Re: Performance Issue on Nagios XI server
Posted: Tue Sep 29, 2015 7:08 am
by lmiltchev
Yes, you can delete some old xml log files to clear some space. Make sure Oracle client was installed properly. Did you follow our documentation?
http://api.nagios.com/tps/viewdashboard ... board_id=1
I have seen a bug report in regards to the error below:
Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]
When the log directory is created, requests for subdirectories "diag/clients/". The "workaround" would be to create these directories manually.
After clearing some space, I would recommend upgrading to Nagios Xi 5R1.0 as there are some performance improvements. You can review the change log here:
https://assets.nagios.com/downloads/nag ... NGES-5.TXT