Performance Issue on Nagios XI server

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
tthomas
Posts: 39
Joined: Mon Jun 01, 2015 6:54 am

Re: Performance Issue on Nagios XI server

Post by tthomas »

Hi tgriep,

Thank you for the reply.

We have been noticing the below error in our nagios server.
Sep 9 17:31:23 ndo2db: Message sent to queue
Sep 9 17:31:23 ndo2db: Error: queue send error, retrying...
Could you please tell what is wrong here.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Issue on Nagios XI server

Post by lmiltchev »

Did you notice this error after truncating the tables? Have you run the database repair script after truncating the tables? Do you have any new errors in the mysqld.log?

Code: Select all

tail -30 /var/log/mysqld.log
The issue could be related to kernel tuning. For more info, see this post:

https://support.nagios.com/wiki/index.p ... 3.x_Issues

What is the output of the following command?

Code: Select all

ulimit -a
Be sure to check out our Knowledgebase for helpful articles and solutions!
tthomas
Posts: 39
Joined: Mon Jun 01, 2015 6:54 am

Re: Performance Issue on Nagios XI server

Post by tthomas »

Hi

We have not performed the repair activity yet. We will be doing it in the near future.

Kernel parameters are already set with correct values.

Code: Select all

[root@ ~]# grep msgmnb /etc/sysctl.conf
kernel.msgmnb = 131072000

[root@ ~]# grep msgmax /etc/sysctl.conf
kernel.msgmax = 131072000

[root@ ~]# grep shmmax /etc/sysctl.conf
kernel.shmmax = 68719476736

[root@ ~]# grep shmall /etc/sysctl.conf
kernel.shmall = 4294967296

[root@ ~]# grep msgmni /etc/sysctl.conf
kernel.msgmni = 256000
Please find the system details.

Code: Select all

Nagios XI 2012R1.2 
Linux  2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Please see the mysqld log attached.
mysqld_log.txt
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Issue on Nagios XI server

Post by lmiltchev »

Let us know if truncating the tables and running the database repair script fixed your issue. Check the log again AFTER running the repair script just in case.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tthomas
Posts: 39
Joined: Mon Jun 01, 2015 6:54 am

Re: Performance Issue on Nagios XI server

Post by tthomas »

Hi

We did it on a test nagios XI instance. We have repaired a 1GB database and truncated the logentries and statehistory tables.

Eventhough we did not find any improvement on server load or GUI speed we are going to do the same on our production Nagios XI instance this weekend.

I have a question regarding truncating logentries and statehistory tables. How it will be reflected in GUI? What data won't be available in GUI?

Regards
Tino
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance Issue on Nagios XI server

Post by tgriep »

Truncating those tables will affect some of the reports for historical data.
The reports that will be affected are State History, Event Log, Availability and parts of the Exec Summary report.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tthomas
Posts: 39
Joined: Mon Jun 01, 2015 6:54 am

Re: Performance Issue on Nagios XI server

Post by tthomas »

Hi

We performed the repair activity on production Nagios XI instance. Still we are not able to find much improvement.
Mysql_Repair_Log.txt

Code: Select all

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6548 mysql     20   0 8362m 1.1g 3500 S 338.5  3.7   4556:17 /usr/libexec/mysqld --basedir=/usr --datadir=/u01/appl/mysql --user=mysql --log-error=/var/log/mysqld.log
24882 apache    20   0  651m  33m 9604 R 64.2  0.1  19:01.39 /usr/sbin/httpd
  858 apache    20   0  702m  83m 8740 S 42.4  0.3   6:04.95 /usr/sbin/httpd
30968 apache    20   0  608m  85m 9268 S 41.0  0.3   3:39.17 /usr/sbin/httpd
15168 apache    20   0  600m  78m 9260 R 24.8  0.2   2:59.79 /usr/sbin/httpd
18488 apache    20   0  619m  95m 8520 S 23.8  0.3   1:48.22 /usr/sbin/httpd
32733 apache    20   0  718m 101m 9568 R  7.9  0.3  18:59.36 /usr/sbin/httpd
18747 apache    20   0  583m  61m 9280 R  5.3  0.2   1:15.72 /usr/sbin/httpd

Still we can see high load on the server and applying configuration fails sometimes. All the information about the server is provided in earlier posts.

Should we tweak apache as well?

Regards
Tino
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Performance Issue on Nagios XI server

Post by ssax »

Are you still getting errors in your /var/log/mysqld.log? Please post the latest copy.

From the original log you uploaded:

Code: Select all

150909 16:05:51 [ERROR] /usr/libexec/mysqld: Incorrect key file for table '/tmp/#sql_bc3_4.MYI'; try to repair it
150909 16:05:49 [ERROR] /usr/libexec/mysqld: Sort aborted
These errors could be related to a lack of disk space, you only have 3.8GB available and the temp files that mysql creates can get quite large, my recommendation is for you to clean up more free space.
tthomas
Posts: 39
Joined: Mon Jun 01, 2015 6:54 am

Re: Performance Issue on Nagios XI server

Post by tthomas »

Hi,

This error last appeared on 150927 4:37:53

Code: Select all

[root@tmp]# tail /var/log/mysqld.log
150927  4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:33 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:34 [ERROR] /usr/libexec/mysqld: Sort aborted
150927  4:37:53 [ERROR] /usr/libexec/mysqld: Sort aborted

I can see that the following directory is getting filled fast with some xml logs. Are they related to nagios?

Code: Select all

[root@host_3954948670_80]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80

[root@host_3954948670_80]# du -sh *
5.3G    alert
4.0K    cdump
4.0K    incident
4.0K    incpkg
4.0K    lck
260K    metadata
4.0K    metadata_dgif
4.0K    metadata_pv
4.0K    stage
4.0K    sweep
1.4G    trace

Code: Select all

[root@alert]# pwd
/home/nagios/oradiag_nagios/diag/clients/user_nagios/host_3954948670_80/alert

-rw-rw---- 1 nagios users 10486797 Sep 26 09:33 log_537.xml
-rw-rw---- 1 nagios users 10486395 Sep 28 13:59 log_538.xml
-rw-rw---- 1 nagios users 10485770 Sep 29 06:46 log_539.xml
-rw-rw---- 1 nagios users  1604457 Sep 29 09:23 log.xml

11M     log_97.xml
11M     log_98.xml
11M     log_99.xml
11M     log_9.xml
These files getting filled with the following content repeatedly

Code: Select all

<msg time='2015-09-29T06:46:24.630+02:00' org_id='oracle' comp_id='clients'
 type='UNKNOWN' level='16' host_id='NAGIOSHOST'
 host_addr='NAGIOS_IP'>
 <txt>Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]
 </txt>
</msg>
Can we delete these old xml log files to clear up some space on / ?

Regards
Tino
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Performance Issue on Nagios XI server

Post by lmiltchev »

Yes, you can delete some old xml log files to clear some space. Make sure Oracle client was installed properly. Did you follow our documentation?

http://api.nagios.com/tps/viewdashboard ... board_id=1

I have seen a bug report in regards to the error below:
Directory does not exist for read/write [/usr/lib/oracle/11.2/client64/log] [/usr/lib/oracle/11.2/client64/log/diag]
When the log directory is created, requests for subdirectories "diag/clients/". The "workaround" would be to create these directories manually.

After clearing some space, I would recommend upgrading to Nagios Xi 5R1.0 as there are some performance improvements. You can review the change log here:

https://assets.nagios.com/downloads/nag ... NGES-5.TXT
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked