Hi,
Looking through the logs there are a lot of errors/failures. We'll need to clean them up.
1. First in database_log.txt:
Code: Select all
210728 14:23:44 InnoDB: Error: trying to open a table, but could not
InnoDB: open the tablespace file './nagiosxi/#sql-611e_d2c08.ibd'!
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: It is also possible that this is a temporary table #sql...,
InnoDB: and MySQL removed the .ibd file for this.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
210728 14:23:44 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
Not sure what is going on there, but for starters please look at:
https://assets.nagios.com/downloads/nag ... tabase.pdf
to see if repairing the database helps.
Please send me any output from the repair operation.
Once you have restarted Nagios please wait about half an hour and then take
another System Profile and send it to me. I'll take a look at it on the off chance that
it solves issues 2 and 3 below. So please wait to hear back from me before going
on to steps 2 and 3.
2. Next look at the attached errors.pdf
Where you see items like (key on status -1):
Code: Select all
00017: Jul 29 03:10:01 brnagios1 rrdcached[31059]: queue_thread_main: rrd_update_r
(/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd) failed with status -1.
(/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd: found extra data on update argument: 159976.21:160978.44)
That means either the service command changed or what was being monitored has changed. For example with
the Disk Usage checks was another drive (D:\) added? A quick way to clean this up is too remove the files:
/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd
/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.xml
The downside of this is that you will lose that performance data.
If you want to try and save the data please let me know and I will track down how to do that.
3. Within the attached file there are 159 instances of:
Code: Select all
00138: Jul 29 03:12:30 brnagios1 nagios: SERVICE ALERT: stg01autabweb01.ux.corp.local;Swap Usage;CRITICAL;SOFT;2;(Service Check Timed Out On
Worker: brnagios1.ux.corp.local)
These might go away once the perf data issue (#2) is cleaned up.
Thanks