Availability report is showing negative values

dennisg · Post by **dennisg** » Mon Jun 12, 2017 12:49 am

As described earlier, the error only occurs, when a scheduled downtime took place during the report time period, so the logfile-format is obviously not an issue.
Setting the "First Assumed Service State" to "Service OK" leads to the same (erroneous) result.

Post by **tgriep** » Mon Jun 12, 2017 9:06 am

I cannot find any information in the log files for any host or service that was put in to downtime so there isn't much that can be dome.
The only option is to file a bug report and the Nagios Core Github site at the link below.
https://github.com/NagiosEnterprises/nagioscore/issues

dennisg · Post by **dennisg** » Tue Jun 13, 2017 12:52 am

hmm...

Please take a look e.g. @nagios-04-11-2017-00.log. This is the date, that's mentioned in the detailed description / screenshots.
There's an entry that reads

[1491812175] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime

May be, you've overread that?
I would be very happy if you could have another look into that issue. If that's still not showing up anything more useful I will be happy to file a bug report @github.

Many thanks,
Dennis

Post by **tgriep** » Tue Jun 13, 2017 11:08 am

I must of had a typo in my grep command, I do see the downtime entries when I rechecked them.

Code: Select all

4/11
[1491812175] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/12
[1491926047] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1491926370] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/13
[1491992158] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime
[1491998238] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;myhost;my-webapp;1491998182;1492523782;1;0;7200;nagiosadmin;(JE) RW
[1491998239] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1491998580] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/14
[1492073607] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1492073976] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/19
[1492523782] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime

What could be causing it that the service was put in to downtime multiple times but only exited once and that could throw the math off.
I tried to simulate the issue but could not duplicate it.
Since the upgrade, can you run the same report from that day to now to see if the issue is there?

If the reports works after the upgrade, them the issue could of been fixed but it it still fails, then you would have to put in a bug report.

dennisg · Post by **dennisg** » Wed Jun 14, 2017 12:51 am

The report still fails with the same result after the upgrade.
I will run a new scheduled downtime for 2 hrs for today on my sandbox, that only consists of this single service, in order to keep track of the logs.
No idea, why there would be multiple starts and just a single stop.

If that's still failing -that would be all on Nagios core 4.3.2- I will file a bug report.

Thanks a lot,
Dennis

dennisg · Post by **dennisg** » Wed Jun 14, 2017 4:54 am

Okay, today's test on my sandbox containing a fixed downtime of 2hrs showed a good result, as well as a single start and stop for the downtime.
The error seems to be up -as you guestimated earlier- to multiple starts of a downtime with an unequal amounts of stops.

Funny questions are:
1.) What would have caused such a behaviour?!
2.) This seems to have happenend quite regularly (even though I don't know how to forcefully reproduce it (yet)). Take a look at my logs from PROD, the 1st command being the manually configured scheduled downtimes by us admins:

nagios@my-host[/usr/local/nagios/var/archives]> grep "SCHEDULE_SVC_DOWNTIME" nagios*-2017-*.log | grep "my-service" | wc -l
9

nagios@my-host[/usr/local/nagios/var/archives]> grep "SERVICE DOWNTIME ALERT:.*STARTED" nagios*-2017-*.log | grep "my-service" | wc -l
253

nagios@my-host[/usr/local/nagios/var/archives]> grep "SERVICE DOWNTIME ALERT:.*STOPPED" nagios*-2017-*.log | grep "my-service" | wc -l
54

Anyone seen that odd behaviour before?!

3.) Any idea on how to fix that data in order to get proper reports? I cannot just "bulk-delete" the entries from the logs...

dennisg · Post by **dennisg** » Wed Jun 14, 2017 6:53 am

Okay, an uneven amount of starts and stops seems to be explainable: reloading the config forces active downtimes to be re-initiated, see the following example:

On Wed, 14 Jun 2017 11:27:13 GMT I scheduled a downtime for 2 minutes, starting at 1497439680 (i.e. Wed, 14 Jun 2017 11:28:00 GMT), see line 1
The downtime kicked in as expected, see line 2
I reloaded the nagios-config. The exaxt same downtime got started again, see line 25
The downtime stopped, see last line

Code: Select all

[1497439633] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;myhost;my-webapp;1497439680;1497439800;1;0;7200;nagiosadmin;[DG] short test
[1497439680] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1497439695] Caught SIGHUP, restarting...
[1497439695] Event broker module 'NERD' deinitialized successfully.
[1497439695] Warning: use_embedded_perl_implicitly is deprecated and will be removed.
[1497439695] Warning: enable_embedded_perl is deprecated and will be removed.
[1497439695] Warning: p1_file is deprecated and will be removed.
[1497439695] Warning: sleep_time is deprecated and will be removed.
[1497439695] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[1497439695] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[1497439695] Nagios 4.3.2 starting... (PID=18327)
[1497439695] Local time is Wed Jun 14 13:28:15 CEST 2017
[1497439695] LOG VERSION: 2.0
[1497439695] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1497439695] qh: core query handler registered
[1497439695] nerd: Channel hostchecks registered successfully
[1497439695] nerd: Channel servicechecks registered successfully
[1497439695] nerd: Channel opathchecks registered successfully
[1497439695] nerd: Fully initialized and ready to rock!
[1497439695] wproc: Successfully registered manager as @wproc with query handler
[1497439695] wproc: Registry request: name=Core Worker 15359;pid=15359
[1497439695] wproc: Registry request: name=Core Worker 15358;pid=15358
[1497439695] wproc: Registry request: name=Core Worker 15357;pid=15357
[1497439695] wproc: Registry request: name=Core Worker 15356;pid=15356
[1497439695] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1497439799] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime

So this seems to be somewhat to be the "expected behaviour"?!

The appropriate report (last 24hrs) shows no "negative values", however, the timing still seems to be incorrect, e.g. Event Duration "15m 34s+"

Any idea / advise?

Post by **tgriep** » Wed Jun 14, 2017 9:34 am

Actually, I was mistaken. The log entry

Code: Select all

[1497439695] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime

I think is that system is just logging that the host is still in downtime after the daemon has restarted and is not re-starting it as it ended after 2 minutes like it should.
It is logging that the system is currently in downtime only.

There does seem to be a bug in the report, the last line in the Log Entries field, the Event Duration is incrementing every time the report is run.
If you keep running it a few times, with a few minutes between run times, it will increment.

Try and see if that is happening on your system and if it is, you will have to put in a bug report for it.

Nagios Support Forum

Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values

Re: Availability report is showing negative values