Take a close look at this nagios.log file snippet (UNIX datestamp converted to human readable form). I've changed service name to ServiceName and host name to HostName to preserve some secret details. The important stuff is from 05:26:47 through 05:27:54:
Code: Select all
Wed Mar 13 00:11:27 2019 - SERVICE ALERT: HostName;ServiceName;CRITICAL;SOFT;1;CRITICAL - Plugin timed out
Wed Mar 13 00:12:29 2019 - SERVICE ALERT: HostName;ServiceName;OK;SOFT;2;UK-BACKUP:true UK-LTS:true UK-VMtemplates:true
Wed Mar 13 05:25:36 2019 - SERVICE ALERT: HostName;ServiceName;CRITICAL;SOFT;1;CRITICAL - Plugin timed out
Wed Mar 13 05:26:47 2019 - SERVICE ALERT: HostName;ServiceName;CRITICAL;SOFT;2;CRITICAL - Plugin timed out
Wed Mar 13 05:27:54 2019 - SERVICE NOTIFICATION: HostName;ServiceName;OK;notify-service-by-email;UK-BACKUP:true UK-LTS:true UK-VMtemplates:true
Wed Mar 13 05:27:54 2019 - SERVICE ALERT: HostName;ServiceName;OK;HARD;3;UK-BACKUP:true UK-LTS:true UK-VMtemplates:true
Wed Mar 13 05:53:12 2019 - SERVICE ALERT: HostName;ServiceName;CRITICAL;SOFT;1;CRITICAL - Plugin timed out
Wed Mar 13 05:54:14 2019 - SERVICE ALERT: HostName;ServiceName;OK;SOFT;2;UK-BACKUP:true UK-LTS:true UK-VMtemplates:true
This customer has LOTS of examples of improper notifications but most of them stopped after the 5.5.11 upgrade (but, for reasons I can't go into here, they don't upgrade, they always freshly reinstall). This is the first one since then that has sent an OK;HARD notification without having first sent a CRITICAL;HARD notification. Also, shouldn't the HARD;OK have been a SOFT;OK in the first place? After all, there was no HARD;CRITICAL first.
Thanks.