Service recovery emails generated without status change

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Service recovery emails generated without status change

Post by rferebee »

Hello,

For some reason daily, we are seeing service recovery emails being generated by Nagios XI for a subset of our Linux hosts. This is causing a lot of "SPAM" to be sent out as well as causing wasted time with administrators checking their hosts to see if they went Critical.

It seems to only affect Linux hosts and one Contact Group in particular. I opened a similar request several months ago and was told that version 5.5.7 would resolve the bug, but we're now on version 5.5.11 and it's still happening.

Any help would be appreciated. Thank you!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Service recovery emails generated without status change

Post by rferebee »

Here's a screen shot of some of the recovery notification being generated. At no point did these services change their state that would explain the recovery.
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service recovery emails generated without status change

Post by benjaminsmith »

Hello @rferebee,

Nagios Core 4.4.3 fixed a few issues with notifications and recoveries. Could you send your system profile for us to review?

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the ticket.

Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Service recovery emails generated without status change

Post by rferebee »

PM with system profile sent. Thank you.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Service recovery emails generated without status change

Post by benjaminsmith »

Hi @rferebee,

Thanks for sending over the system profile. I noticed that the nagios_logentries table is corrupted, please run the following as root from the terminal to repair the database.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
Next, you do have recovery notifications enabled. Is this intentional?

Code: Select all

define host {
    host_name                   pug
    use                         xiwizard_generic_host
    address                     ip-address
    hostgroups                  AIXDWSSProdServer
    max_check_attempts          5
    check_interval              5
    retry_interval              1
    check_period                xi_timeperiod_24x7
    contact_groups              SUGContact,Welfare DBA,Welfare Websphere Group
    notification_interval       1440
    notification_period         xi_timeperiod_24x7
    first_notification_delay    7
    notification_options        d,u,r,f
    notifications_enabled       1
    _xiwizard                   autodiscovery
    register                    1
}
define service {
    host_name                   pug
    service_description         Disk Check /db/database
    use                         AIXDiskServiceOra
    check_command               check_nrpe!check_disk1!20!20% 10% "/db/database"!!!!!
    max_check_attempts          5
    check_interval              5
    retry_interval              1
    check_period                xi_timeperiod_24x7
    notification_interval       1440
    notification_period         xi_timeperiod_24x7
    notification_options        w,c,u,r,f
    notifications_enabled       1
    contact_groups              SUGContact,Welfare DBA,Welfare Websphere Group
    register                    1
}
The next step would be to pull the state history report for the host and service in question to determine if it experienced a hard recovery or not. Go to Reports > State History and limit the report to Pug for 04-16-2019, then elect State Type as Both and State as Any.

If those services in question did experience a hard recovery, then Nagios would be notifying as expected.

Reference
State Types
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Service recovery emails generated without status change

Post by rferebee »

Ok, I actually ran a database repair this morning before I sent you the system profile. So, that means you're still seeing the entries as corrupt after the repair. Perhaps my database repairs aren't working?

We do have recoveries enabled intentionally, if something goes critical we want to know when it recovers.

I read the reference article you supplied, but I'm still having trouble understanding what a HARD recovery is? The state of the service hasn't changed since January. I don't understand why it would need to send a notice that it recovered.

Can you elaborate?
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Service recovery emails generated without status change

Post by npolovenko »

@rferebee, On the report screenshot you sent us I'm seeing that the Disk Check /db/database service was in a critical hard state until today and then it recovered. So I'd expect to receive a recovery email notification.

Can you run the following command to truncate email tables in the database and let us know if that fixes the problem?
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
mysqlcheck -r -f -uroot -pnagiosxi --all-databases --use_frm
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Service recovery emails generated without status change

Post by rferebee »

Well, I can say beyond a shadow of a doubt. That service check has not been critical since January. I monitor our Nagios environment almost all day as well as send weekly reports of what services are in critical and warning states.

I don't dispute that that's what the report says, but it definitely wasn't in a critical state for 3 months. There must be a disconnect somewhere in our environment.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Service recovery emails generated without status change

Post by rferebee »

Also, I got an error when I tried to run the first command provided:

root@nagiosxi> echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
ERROR 1049 (42000): Unknown database 'nagiosxi'
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Service recovery emails generated without status change

Post by npolovenko »

@rferebee, Seems like you're using postgres for the nagiosxi database. Please run the following commands instead:
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
echo "vacuum;vacuum analyze;"|psql nagiosxi postgres
service postgresql restart
Can you generate the state history report for the same host and service and make sure that you select "Type" -> Both.

Finally, could you send in your Nagios XI System Profile so I can review it?
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message. Or you can upload it in the thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked