Page 2 of 3

Re: Long check output (>255) and dropped State History

Posted: Wed Feb 10, 2016 10:09 am
by DigNetwerk
Thanks! (sorry forgot to look here for a few days)

Do you have a reference number/url for the bug?

Re: Long check output (>255) and dropped State History

Posted: Wed Feb 10, 2016 12:57 pm
by tgriep
The response I received from the developers is that the OK state with different output is preferred behavior for the state history report.
The event log will have the necessary information - when the output is different for ok state.

Re: Long check output (>255) and dropped State History

Posted: Thu Feb 11, 2016 6:07 am
by DigNetwerk
Seriously?

What is the state stalking on service OK state supposed to do then??

https://assets.nagios.com/downloads/nag ... lking.html
https://assets.nagios.com/downloads/nag ... ml#service

I quote from that 2nd page:
stalking_options: This directive determines which service states "stalking" is enabled for. Valid options are a combination of one or more of the following: o = stalk on OK states, w = stalk on WARNING states, u = stalk on UNKNOWN states, and c = stalk on CRITICAL states. More information on state stalking can be found here.
o = STALK ON OK STATES

So if I specify 'o' for a service: what is it supposed to do, if not keep logging even if the state remains OK ?

You just diverged from your own documentation. This is NOT PREFERRED NOR EXPECTED behavior in my opinion. If you don't want to fix it then call it what it is, a WONTFIX.

Re: Long check output (>255) and dropped State History

Posted: Thu Feb 11, 2016 2:46 pm
by bheden
I went ahead and created a test scenario. I set up SNMP Traps on a fresh install of XI using the following documentation: https://assets.nagios.com/downloads/nag ... ios_XI.pdf

Then I added an additional MIB in /usr/share/snmp/mibs/UCD-TRAP-TEST-MIB.txt:

Code: Select all

demoTrap2 TRAP-TYPE
 ENTERPRISE demotraps
 VARIABLES { sysLocation }
 DESCRIPTION "Another example"
 ::= 18
I re-added my mib, and then opened /etc/snmp/snmptt.conf:

Code: Select all

EVENT demoTrap .1.3.6.1.4.1.2021.13.990.0.17 "Status Events" Normal
EVENT demoTrap2 .1.3.6.1.4.1.2021.13.990.0.18 "Status Events" Warning
I restarted snmptt service. Then, from within Nagios, I used the SNMP Trap Wizard to create a service on my localhost host. I ensured that this service had stalking enabled on the OK state. The series of commands I use throughout my testing are as follows:

Code: Select all

snmptrap -v 1 -c public 127.0.0.1 UCD-TRAP-TEST-MIB::demotraps "" 6 17 "" SNMPv2-MIB::sysLocation.0 s "OK Status #1"
snmptrap -v 1 -c public 127.0.0.1 UCD-TRAP-TEST-MIB::demotraps "" 6 17 "" SNMPv2-MIB::sysLocation.0 s "OK Status #2"
snmptrap -v 1 -c public 127.0.0.1 UCD-TRAP-TEST-MIB::demotraps "" 6 18 "" SNMPv2-MIB::sysLocation.0 s "Warning Status #1"
snmptrap -v 1 -c public 127.0.0.1 UCD-TRAP-TEST-MIB::demotraps "" 6 18 "" SNMPv2-MIB::sysLocation.0 s "Warning Status #2"
In the Event Log Report, here is an attached screenshot of the relevant output:
stalking_ok.PNG
Then, I enabled stalking on Warning (and kept the OK state stalking enabled as well), and re-issued the same commands. Here is the attached screenshot of the relevant output:
stalking_ok_and_warning.PNG
Finally, I disabled all state stalking for this service, and re-issued the same commands. Here is the attached screenshot of the relevant output:
no_stalking.PNG
The behavior is as described in the documentation that you linked:
https://assets.nagios.com/downloads/nag ... lking.html
With state stalking enabled, Nagios would have examined the output from each service check to see if it differed from the output of the previous check. If the output differed and the state of the service didn't change between the two checks, the result of the newer service check would get logged.
As you can see from the attached screenshots, when stalking was enabled for ok, multiple states [where the output differed] WERE logged, unlike when stalking was not enabled.

If volatility is enabled, and stalking is disabled, the State History report should show multiple non-OK states in a row, whether the output is different or not. If you need to see multiple occuring OK states with differing output, please use the Event Log report.

Hope this helps clear it up.

Re: Long check output (>255) and dropped State History

Posted: Fri Feb 12, 2016 5:03 am
by DigNetwerk
As you can see from the attached screenshots, when stalking was enabled for ok, multiple states [where the output differed] WERE logged, unlike when stalking was not enabled.
I know that, I mentioned this myself when I helped troubleshooting this bug.

But who looks at the Event Log for checking what happened with a single service? Why doesn't that show up in the State History just like for non-OK states? It's totally illogical and unexpected!

Re: Long check output (>255) and dropped State History

Posted: Fri Feb 12, 2016 11:17 am
by bheden
It's totally illogical and unexpected!
I won't completely disagree with you on that one ;) This threw me off a bit at first, as well. With that said, now we know where to look and we can plan for it!

On the other hand though, the majority of our user base has requested that the State History be a clean and concise list of State Changes, and not Output Changes.

This definitely brings up the possibility of a report that specifically lists Output Changes though. We'll be adding that internally as a Feature Request. Thanks for bringing this to our attention.

Re: Long check output (>255) and dropped State History

Posted: Mon Feb 15, 2016 5:02 am
by DigNetwerk
Hello,
On the other hand though, the majority of our user base has requested that the State History be a clean and concise list of State Changes, and not Output Changes.
I totally get this, most of the time that is what I want too. And it is the default behavior and I'm not advocating to change that at all! But for this case I don't want it, so I enable the stalking option, *explicitly for OK states* and it still doesn't show up!

So what does this stalking for OK states option do then at the moment? Nothing. So it's a bug or an unimplemented feature. You can decide how to call it, but it's one of those, please admit this.

In this light there is also no need for a separate Output Changes report. You want output changes logged, even for OK states, in State History? -> enable stalking on OK states. You don't want it? -> you don't need do anything!

Re: Long check output (>255) and dropped State History

Posted: Mon Feb 15, 2016 5:45 pm
by tmcdonald
DigNetwerk wrote:So it's a bug or an unimplemented feature. You can decide how to call it, but it's one of those, please admit this.
I would need to defer to our Core dev on this, but I agree it definitely sounds like one or the other. I'll see if we can get a quote from him, but since it is the end of our day he may not reply until tomorrow.

Re: Long check output (>255) and dropped State History

Posted: Thu Feb 18, 2016 3:27 am
by DigNetwerk
Thank you tmcdonald, any word yet?

Re: Long check output (>255) and dropped State History

Posted: Thu Feb 18, 2016 5:54 pm
by bheden
This definitely brings up the possibility of a report that specifically lists Output Changes though. We'll be adding that internally as a Feature Request.
I've created the Feature Request, Task ID 7814. It would speed up development considerably if we could get some additional users to chime in if they'd use this functionality as well.