Issue with manual and scheduled suppression

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
pubstars
Posts: 5
Joined: Wed Mar 09, 2016 10:07 am

Issue with manual and scheduled suppression

Post by pubstars »

Hello all,

We are currently running Nagios Core 4.0.8 in our live environment and currently ran into a issue where a user scheduled suppression, then manual suppression was applied and these cancelled each other out. I ran some tests and could reproduce the bug on our dev system also. Below are the logs with headings.

I did 3 tests this morning trying to replicate this. Here are the steps from one of those tests.
Downtime Scheduled
[1455193232] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;<hostname>;1455193800;1455199200;1;0;7200;<user>;xvdfv
[1455193232] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;<hostname>;Temperature-Check;1455193800;1455199200;1;0;7200;<user>;xvdfv
[1455193232] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;<hostname>;Humidity-Check;1455193800;1455199200;1;0;7200;<user>;xvdfv
[1455193232] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;<hostname>;ping;1455193800;1455199200;1;0;7200;<user>;xvdfv

Device enters period of scheduled downtime
[1455193799] HOST DOWNTIME ALERT: <hostname>;STARTED; Host has entered a period of scheduled downtime
[1455193799] SERVICE DOWNTIME ALERT: <hostname>;ping;STARTED; Service has entered a period of scheduled downtime
[1455193799] SERVICE DOWNTIME ALERT: <hostname>;Temperature-Check;STARTED; Service has entered a period of scheduled downtime
[1455193799] SERVICE DOWNTIME ALERT: <hostname>;Humidity-Check;STARTED; Service has entered a period of scheduled downtime

Device monitoring manually disabled
[1455195893] EXTERNAL COMMAND: DISABLE_HOST_SVC_NOTIFICATIONS;<hostname>
[1455195893] EXTERNAL COMMAND: DISABLE_HOST_NOTIFICATIONS;<hostname>
[1455195893] EXTERNAL COMMAND: DISABLE_HOST_SVC_NOTIFICATIONS;<hostname>
[1455195893] EXTERNAL COMMAND: DISABLE_HOST_NOTIFICATIONS;<hostname>

Device goes offline and starts to alert
[1455195954] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;SOFT;1;UNKNOWN - Can't establish SNMP connection
[1455195964] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;SOFT;1;UNKNOWN - Can't establish SNMP connection
[1455196011] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;SOFT;1;UNKNOWN - Can't establish SNMP connection
[1455196038] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;SOFT;1;UNKNOWN - Can't establish SNMP connection
[1455196067] SERVICE ALERT: <hostname>;ping;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
[1455196108] SERVICE ALERT: <hostname>;ping;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
[1455196134] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;SOFT;2;UNKNOWN - Can't establish SNMP connection
[1455196144] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;SOFT;2;UNKNOWN - Can't establish SNMP connection
[1455196190] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;SOFT;2;UNKNOWN - Can't establish SNMP connection
[1455196217] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;SOFT;2;UNKNOWN - Can't establish SNMP connection
[1455196247] SERVICE ALERT: <hostname>;ping;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
[1455196288] SERVICE ALERT: <hostname>;ping;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
[1455196313] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;HARD;3;UNKNOWN - Can't establish SNMP connection
[1455196324] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;HARD;3;UNKNOWN - Can't establish SNMP connection
[1455196371] SERVICE ALERT: <hostname>;Temperature-Check;UNKNOWN;HARD;3;UNKNOWN - Can't establish SNMP connection
[1455196398] SERVICE ALERT: <hostname>;Humidity-Check;UNKNOWN;HARD;3;UNKNOWN - Can't establish SNMP connection
[1455196427] SERVICE ALERT: <hostname>;ping;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%
[1455196468] SERVICE ALERT: <hostname>;ping;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%

High severity TT cut
[1455196468] SERVICE NOTIFICATION: cut-ticket-sev2-criticial;<hostname>;ping;CRITICAL;cut-service-tt;PING CRITICAL - Packet loss = 100%
[1455196474] EXTERNAL COMMAND: ADD_SVC_COMMENT;<hostname>;ping;1;Nagios;<webb address to cut TT too>

Manual monitoring enabled
[1455196823] EXTERNAL COMMAND: ENABLE_HOST_SVC_NOTIFICATIONS;<hostname>
[1455196823] EXTERNAL COMMAND: ENABLE_HOST_NOTIFICATIONS;<hostname>

Scheduled downtime cancelled
[1455196848] HOST DOWNTIME ALERT: <hostname>;CANCELLED; Scheduled downtime for host has been cancelled.
[1455196848] SERVICE DOWNTIME ALERT: <hostname>;ping;CANCELLED; Scheduled downtime for service has been cancelled.
[1455196848] SERVICE DOWNTIME ALERT: <hostname>;Humidity-Check;CANCELLED; Scheduled downtime for service has been cancelled.
[1455196848] SERVICE DOWNTIME ALERT: <hostname>;Temperature-Check;CANCELLED; Scheduled downtime for service has been cancelled.

Device brought back online
[1455197028] SERVICE ALERT: <hostname>;Temperature-Check;OK;HARD;3;TEMPERATURE OK: 2 sensor(s) in normal scope
[1455197038] SERVICE ALERT: <hostname>;Humidity-Check;OK;HARD;3;HUMIDITY OK: 2 sensor(s) in normal scope
[1455197084] SERVICE ALERT: <hostname>;Temperature-Check;OK;HARD;3;TEMPERATURE OK: 2 sensor(s) in normal scope
[1455197111] SERVICE ALERT: <hostname>;Humidity-Check;OK;HARD;3;HUMIDITY OK: 2 sensor(s) in normal scope
[1455197241] SERVICE ALERT: <hostname>;ping;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 27.15 ms
[1455197282] SERVICE ALERT: <hostname>;ping;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 28.65 ms

Regards
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Issue with manual and scheduled suppression

Post by ssax »

Looks like you found a bug, I'll lab it up on Monday and let you know if I can replicate it before we submit it.

What Linux distro and version are you running as well?

Code: Select all

uname -a
Thank you
pubstars
Posts: 5
Joined: Wed Mar 09, 2016 10:07 am

Re: Issue with manual and scheduled suppression

Post by pubstars »

Please see below:-

Linux <hostname> 3.2.45-0.6.acc.624.45.283.<org>1acc.x86_64 #1 SMP Fri Nov 21 22:39:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Issue with manual and scheduled suppression

Post by hsmith »

What's the output of this command?

Code: Select all

cat /etc/*release*
Former Nagios Employee.
me.
pubstars
Posts: 5
Joined: Wed Mar 09, 2016 10:07 am

Re: Issue with manual and scheduled suppression

Post by pubstars »

Red Hat Enterprise Linux Server release 5.3 (Tikanga)
<company> Linux Bare Metal release 2012.03
cpe:/o:<company>:linux:2012.03:ga
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Issue with manual and scheduled suppression

Post by ssax »

Do you still have that full log during that timeperiod still (can check under your /usr/local/nagios/var/archive/)? We would like to take a look at it for reference.
pubstars
Posts: 5
Joined: Wed Mar 09, 2016 10:07 am

Re: Issue with manual and scheduled suppression

Post by pubstars »

Hello,

No we don't have the logs for the time period, they are flushed once a month.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Issue with manual and scheduled suppression

Post by tmcdonald »

Is this something you can replicate? Without the log it'll be hard to compare notes, so to speak. If you can replicate the behavior and share that log we can see about testing this on our end.
Former Nagios employee
pubstars
Posts: 5
Joined: Wed Mar 09, 2016 10:07 am

Re: Issue with manual and scheduled suppression

Post by pubstars »

I could replicate it and the logs from the test are above. I will try and replicate this again over the next week.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Issue with manual and scheduled suppression

Post by hsmith »

Let us know what happens. Thanks!
Former Nagios Employee.
me.
Locked