Performance Issue on Nagios XI server

tthomas · Post by **tthomas** » Wed Sep 09, 2015 10:37 am

Hi tgriep,

Thank you for the reply.

We have been noticing the below error in our nagios server.

Sep 9 17:31:23 ndo2db: Message sent to queue
Sep 9 17:31:23 ndo2db: Error: queue send error, retrying...

Could you please tell what is wrong here.

Post by **lmiltchev** » Wed Sep 09, 2015 12:53 pm

Did you notice this error after truncating the tables? Have you run the database repair script after truncating the tables? Do you have any new errors in the mysqld.log?

Code: Select all

tail -30 /var/log/mysqld.log

The issue could be related to kernel tuning. For more info, see this post:

https://support.nagios.com/wiki/index.p ... 3.x_Issues

What is the output of the following command?

Code: Select all

ulimit -a

tthomas · Post by **tthomas** » Thu Sep 10, 2015 1:56 am

Hi

We have not performed the repair activity yet. We will be doing it in the near future.

Kernel parameters are already set with correct values.

Code: Select all

[root@ ~]# grep msgmnb /etc/sysctl.conf
kernel.msgmnb = 131072000

[root@ ~]# grep msgmax /etc/sysctl.conf
kernel.msgmax = 131072000

[root@ ~]# grep shmmax /etc/sysctl.conf
kernel.shmmax = 68719476736

[root@ ~]# grep shmall /etc/sysctl.conf
kernel.shmall = 4294967296

[root@ ~]# grep msgmni /etc/sysctl.conf
kernel.msgmni = 256000

Please find the system details.

Code: Select all

Nagios XI 2012R1.2 
Linux  2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.4 (Santiago)

Please see the mysqld log attached.

mysqld_log.txt

Post by **lmiltchev** » Thu Sep 10, 2015 9:26 am

Let us know if truncating the tables and running the database repair script fixed your issue. Check the log again AFTER running the repair script just in case.

tthomas · Post by **tthomas** » Thu Sep 24, 2015 9:46 am

Hi

We did it on a test nagios XI instance. We have repaired a 1GB database and truncated the logentries and statehistory tables.

Eventhough we did not find any improvement on server load or GUI speed we are going to do the same on our production Nagios XI instance this weekend.

I have a question regarding truncating logentries and statehistory tables. How it will be reflected in GUI? What data won't be available in GUI?

Regards
Tino

Post by **tgriep** » Thu Sep 24, 2015 11:54 am

Truncating those tables will affect some of the reports for historical data.
The reports that will be affected are State History, Event Log, Availability and parts of the Exec Summary report.

tthomas · Post by **tthomas** » Mon Sep 28, 2015 9:31 am

Hi

We performed the repair activity on production Nagios XI instance. Still we are not able to find much improvement.

Mysql_Repair_Log.txt

Code: Select all

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6548 mysql     20   0 8362m 1.1g 3500 S 338.5  3.7   4556:17 /usr/libexec/mysqld --basedir=/usr --datadir=/u01/appl/mysql --user=mysql --log-error=/var/log/mysqld.log
24882 apache    20   0  651m  33m 9604 R 64.2  0.1  19:01.39 /usr/sbin/httpd
  858 apache    20   0  702m  83m 8740 S 42.4  0.3   6:04.95 /usr/sbin/httpd
30968 apache    20   0  608m  85m 9268 S 41.0  0.3   3:39.17 /usr/sbin/httpd
15168 apache    20   0  600m  78m 9260 R 24.8  0.2   2:59.79 /usr/sbin/httpd
18488 apache    20   0  619m  95m 8520 S 23.8  0.3   1:48.22 /usr/sbin/httpd
32733 apache    20   0  718m 101m 9568 R  7.9  0.3  18:59.36 /usr/sbin/httpd
18747 apache    20   0  583m  61m 9280 R  5.3  0.2   1:15.72 /usr/sbin/httpd

Still we can see high load on the server and applying configuration fails sometimes. All the information about the server is provided in earlier posts.

Should we tweak apache as well?

Regards
Tino

ssax · Post by **ssax** » Mon Sep 28, 2015 10:34 am

Are you still getting errors in your /var/log/mysqld.log? Please post the latest copy.

From the original log you uploaded:

Code: Select all

150909 16:05:51 [ERROR] /usr/libexec/mysqld: Incorrect key file for table '/tmp/#sql_bc3_4.MYI'; try to repair it
150909 16:05:49 [ERROR] /usr/libexec/mysqld: Sort aborted

These errors could be related to a lack of disk space, you only have 3.8GB available and the temp files that mysql creates can get quite large, my recommendation is for you to clean up more free space.

tthomas · Post by **tthomas** » Tue Sep 29, 2015 2:44 am

Hi,

This error last appeared on 150927 4:37:53

Code: Select all

[root@tmp]# tail /var/log/mysqld.log
150927  4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:34 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:53 [ERROR] /usr/libexec/mysqld: Sort aborted

I can see that the following directory is getting filled fast with some xml logs. Are they related to nagios?

Code: Select all

[root@host_3954948670_80]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80

[root@host_3954948670_80]# du -sh *
5.3G    alert
4.0K    cdump
4.0K    incident
4.0K    incpkg
4.0K    lck
260K    metadata
4.0K    metadata_dgif
4.0K    metadata_pv
4.0K    stage
4.0K    sweep
1.4G    trace

Code: Select all

[root@alert]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80/alert

-rw-rw---- 1 nagios users 10486797 Sep 26 09:33 log_537.xml
-rw-rw---- 1 nagios users 10486395 Sep 28 13:59 log_538.xml
-rw-rw---- 1 nagios users 10485770 Sep 29 06:46 log_539.xml
-rw-rw---- 1 nagios users  1604457 Sep 29 09:23 log.xml

11M     log_97.xml
11M     log_98.xml
11M     log_99.xml
11M     log_9.xml

These files getting filled with the following content repeatedly

Code: Select all

<msg time='2015-09-29T06:46:24.630+02:00' org_id='oracle' comp_id='clients'
 type='UNKNOWN' level='16' host_id='NAGIOSHOST'
 host_addr='NAGIOS_IP'>
 <txt>Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]
 </txt>
</msg>

Can we delete these old xml log files to clear up some space on / ?

Regards
Tino

Post by **lmiltchev** » Tue Sep 29, 2015 7:08 am

Yes, you can delete some old xml log files to clear some space. Make sure Oracle client was installed properly. Did you follow our documentation?

http://api.nagios.com/tps/viewdashboard ... board_id=1

I have seen a bug report in regards to the error below:

Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]

When the log directory is created, requests for subdirectories "diag/clients/". The "workaround" would be to create these directories manually.

After clearing some space, I would recommend upgrading to Nagios Xi 5R1.0 as there are some performance improvements. You can review the change log here:

https://assets.nagios.com/downloads/nag ... NGES-5.TXT

Nagios Support Forum

Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server

Re: Performance Issue on Nagios XI server