Hi nms,
Hope you are having a great day!! ...
I looked at the "profile.zip" and noticed a few things.
You lost connection with your database:
.. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
. <p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
<p><pre>SQL Error [nagiosxi] : MySQL server has gone away</pre></p>
You have lots of INSERT issues:
[1625748999] NDO-3: The following query failed while MySQL appears to be connected:
[1625748999] NDO-3: INSERT INTO nagios_servicechecks (instance_id, start_time, start_time_usec, end_time, end_time_usec, service_object_id, check_type, current_check_attempt, max_check_attempts, state, state_type, timeout, early_timeout, execution_time, latency, return_code, output, long_output, perfdata, command_object_id, command_args, command_line) VALUES (1,FROM_UNIXTIME(1625748996),424887,FROM_UNIXTIME(1625748997),772627,35300,0,1,3,0,1,120,0,1.347740,5.354317,0,'NACK statistics on voicemo for VFNL-WYLS are nack_insf=0:nack_cris=0:nack_nacc=0:nack_nbty=0:nack_nrat=0:nack_wdis=0:nack_tmny=0:nack_nena=0:nack_nbill=0:','','nack_insf=0;nack_cris=0;nack_nacc=0;nack_nbty=0;nack_nrat=0;nack_wdis=0;nack_tmny=0;nack_nena=0;nack_nbill=0;',0,'','') ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), start_time = VALUES(start_time), start_time_usec = VALUES(start_time_usec), end_time = VALUES(end_time), end_time_usec = VALUES(end_time_usec), service_object_id = VALUES(service_object_id), check_type = VALUES(check_type), current_check_attempt = VALUES(current_check_attempt), max_check_attempts = VALUES(max_check_attempts), state = VALUES(state), state_type = VALUES(state_type), timeout = VALUES(timeout), early_timeout = VALUES(early_timeout), execution_time = VALUES(execution_time), latency = VALUES(latency), return_code = VALUES(return_code), output = VALUES(output), long_output = VALUES(long_output), perfdata = VALUES(perfdata), command_object_id = VALUES(command_object_id), command_args = VALUES(command_args), command_line = VALUES(command_line)
You ran out of memory:
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20866 (nagios) total-vm:10844kB, anon-rss:176kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20872 (nagios) total-vm:10844kB, anon-rss:180kB, file-rss:0kB, shmem-rss:0kB
Jul 8 08:56:41 bru-nms-nagios-p kernel: Out of memory: Kill process 20860 (nagios) score 922 or sacrifice child
Jul 8 08:56:41 bru-nms-nagios-p kernel: Killed process 20860 (nagios) total-vm:127009036kB, anon-rss:15040676kB, file-rss:0kB, shmem-rss:0kB
Please try the below commands:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
systemctl restart mariadb.service
systemctl stop httpd
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
pkill -9 -u apache
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
systemctl restart mariadb
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -h 127.0.0.1 -uroot -pnagiosxi nagiosxi
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
As to the out of memory issue, I noticed you have huge amount of checks running at around "08:56" AM today.
You can see that in "/var/log/messages"
Please check and see why you have that many running check at once, which caused you ran out of memory ... I think.
Best Regards,
Vinh