Page 1 of 1

Nagios email notification failure

Posted: Wed Aug 21, 2024 3:35 am
by djkcs2
Greetings!

Our Nagios Xi started to produce the following strange behavior and I cannot find any reason for why it's happening or how to fix it.

The check goes into a state change, the change appears in the GUI and in the logs as it should, however the email notification fails to arrive in time and when it finally arrives it has the current timestamp in the email despite the error happened like 6 months ago. Currently we getting email notifications even from already removed hosts.

Is there any way to resolve it? Did anyone experienced similar behavior? Thanks for any info and help!

Re: Nagios email notification failure

Posted: Wed Aug 21, 2024 9:33 am
by gregbeyer
I have had similar issues. Events being issued from Services already deleted and config applied successfully. Disclaimer, I am not a DBA. And I should not have to be to keep XI healthy, but here we are. However, this, obtained during my trouble tickets around database may prove helpful to you. #1, before you do anything else, ensure last nights XI backup ran. If not, do a full backup of Nagios XI.

How to obtain your mysql root password: sudo /usr/local/nagiosxi/scripts/get_mysql_passwords.sh

First thing to try:

systemctl stop npcd
systemctl stop nagios
systemctl stop crond

su nagios -c '/usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php >> /usr/local/nagiosxi/var/dbmaint.log'

rm -rf /usr/local/nagiosxi/var/dbmaint.lock
/usr/bin/php /usr/local/nagiosxi/cron/dbmaint.php

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -h 127.0.0.1 -uroot -pYOURPASSWORD nagiosxi

systemctl restart httpd
systemctl restart php-fpm
systemctl start npcd
systemctl start crond
systemctl start nagios


If that doesn't work, next steps:

systemctl stop npcd
systemctl stop nagios
systemctl stop crond

echo 'truncate nagios_hoststatus; truncate nagios_hosts; truncate nagios_services; truncate nagios_servicestatus; truncate nagios_servicechecks; truncate nagios_hostchecks; truncate nagios_downtimehistory; truncate nagios_commenthistory;' | mysql -u root -pnagiosxi nagios echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm |grep -vwE "note"

systemctl restart mysqld
rm -f /usr/local/nagios/var/rw/nagios.cmd rm -f /usr/local/nagios/var/nagios.lock rm -f /var/run/nagios.lock rm -f /var/lib/mrtg/mrtg_l rm -f /usr/local/nagiosxi/var/*.lock rm -f /usr/local/nagiosxi/tmp/*.lock

systemctl restart httpd
systemctl restart php-fpm
systemctl start npcd
systemctl start crond
systemctl start nagios

Re: Nagios email notification failure

Posted: Wed Aug 21, 2024 11:56 am
by bbahn
Hello @djkcs2,

There is currently a bug in Nagios Core that sometimes causes checks to be indefinitely or at least extremely delayed as you seem to have experienced: https://github.com/NagiosEnterprises/na ... issues/947. We will handle this as soon as possible, but it is from more than 5 maintainers ago and it may be some time before we can get it fixed.