Availability report is showing negative values

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
dennisg
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Availability report is showing negative values

Post by dennisg »

As described earlier, the error only occurs, when a scheduled downtime took place during the report time period, so the logfile-format is obviously not an issue.
Setting the "First Assumed Service State" to "Service OK" leads to the same (erroneous) result.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Availability report is showing negative values

Post by tgriep »

I cannot find any information in the log files for any host or service that was put in to downtime so there isn't much that can be dome.
The only option is to file a bug report and the Nagios Core Github site at the link below.
https://github.com/NagiosEnterprises/nagioscore/issues
Be sure to check out our Knowledgebase for helpful articles and solutions!
dennisg
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Availability report is showing negative values

Post by dennisg »

hmm...

Please take a look e.g. @nagios-04-11-2017-00.log. This is the date, that's mentioned in the detailed description / screenshots.
There's an entry that reads
[1491812175] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
May be, you've overread that?
I would be very happy if you could have another look into that issue. If that's still not showing up anything more useful I will be happy to file a bug report @github.

Many thanks,
Dennis
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Availability report is showing negative values

Post by tgriep »

I must of had a typo in my grep command, I do see the downtime entries when I rechecked them.

Code: Select all

4/11
[1491812175] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/12
[1491926047] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1491926370] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/13
[1491992158] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime
[1491998238] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;myhost;my-webapp;1491998182;1492523782;1;0;7200;nagiosadmin;(JE) RW
[1491998239] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1491998580] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/14
[1492073607] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1492073976] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
4/19
[1492523782] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime
What could be causing it that the service was put in to downtime multiple times but only exited once and that could throw the math off.
I tried to simulate the issue but could not duplicate it.
Since the upgrade, can you run the same report from that day to now to see if the issue is there?

If the reports works after the upgrade, them the issue could of been fixed but it it still fails, then you would have to put in a bug report.
Be sure to check out our Knowledgebase for helpful articles and solutions!
dennisg
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Availability report is showing negative values

Post by dennisg »

The report still fails with the same result after the upgrade.
I will run a new scheduled downtime for 2 hrs for today on my sandbox, that only consists of this single service, in order to keep track of the logs.
No idea, why there would be multiple starts and just a single stop.

If that's still failing -that would be all on Nagios core 4.3.2- I will file a bug report.

Thanks a lot,
Dennis
dennisg
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Availability report is showing negative values

Post by dennisg »

Okay, today's test on my sandbox containing a fixed downtime of 2hrs showed a good result, as well as a single start and stop for the downtime.
The error seems to be up -as you guestimated earlier- to multiple starts of a downtime with an unequal amounts of stops.

Funny questions are:
1.) What would have caused such a behaviour?!
2.) This seems to have happenend quite regularly (even though I don't know how to forcefully reproduce it (yet)). Take a look at my logs from PROD, the 1st command being the manually configured scheduled downtimes by us admins:

nagios@my-host[/usr/local/nagios/var/archives]> grep "SCHEDULE_SVC_DOWNTIME" nagios*-2017-*.log | grep "my-service" | wc -l
9

nagios@my-host[/usr/local/nagios/var/archives]> grep "SERVICE DOWNTIME ALERT:.*STARTED" nagios*-2017-*.log | grep "my-service" | wc -l
253

nagios@my-host[/usr/local/nagios/var/archives]> grep "SERVICE DOWNTIME ALERT:.*STOPPED" nagios*-2017-*.log | grep "my-service" | wc -l
54

Anyone seen that odd behaviour before?!

3.) Any idea on how to fix that data in order to get proper reports? I cannot just "bulk-delete" the entries from the logs...
dennisg
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Availability report is showing negative values

Post by dennisg »

Okay, an uneven amount of starts and stops seems to be explainable: reloading the config forces active downtimes to be re-initiated, see the following example:

On Wed, 14 Jun 2017 11:27:13 GMT I scheduled a downtime for 2 minutes, starting at 1497439680 (i.e. Wed, 14 Jun 2017 11:28:00 GMT), see line 1
The downtime kicked in as expected, see line 2
I reloaded the nagios-config. The exaxt same downtime got started again, see line 25
The downtime stopped, see last line

Code: Select all

[1497439633] EXTERNAL COMMAND: SCHEDULE_SVC_DOWNTIME;myhost;my-webapp;1497439680;1497439800;1;0;7200;nagiosadmin;[DG] short test
[1497439680] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1497439695] Caught SIGHUP, restarting...
[1497439695] Event broker module 'NERD' deinitialized successfully.
[1497439695] Warning: use_embedded_perl_implicitly is deprecated and will be removed.
[1497439695] Warning: enable_embedded_perl is deprecated and will be removed.
[1497439695] Warning: p1_file is deprecated and will be removed.
[1497439695] Warning: sleep_time is deprecated and will be removed.
[1497439695] Warning: external_command_buffer_slots is deprecated and will be removed. All commands are always processed upon arrival
[1497439695] Warning: command_check_interval is deprecated and will be removed. Commands are always handled on arrival
[1497439695] Nagios 4.3.2 starting... (PID=18327)
[1497439695] Local time is Wed Jun 14 13:28:15 CEST 2017
[1497439695] LOG VERSION: 2.0
[1497439695] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1497439695] qh: core query handler registered
[1497439695] nerd: Channel hostchecks registered successfully
[1497439695] nerd: Channel servicechecks registered successfully
[1497439695] nerd: Channel opathchecks registered successfully
[1497439695] nerd: Fully initialized and ready to rock!
[1497439695] wproc: Successfully registered manager as @wproc with query handler
[1497439695] wproc: Registry request: name=Core Worker 15359;pid=15359
[1497439695] wproc: Registry request: name=Core Worker 15358;pid=15358
[1497439695] wproc: Registry request: name=Core Worker 15357;pid=15357
[1497439695] wproc: Registry request: name=Core Worker 15356;pid=15356
[1497439695] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
[1497439799] SERVICE DOWNTIME ALERT: myhost;my-webapp;STOPPED; Service has exited from a period of scheduled downtime
So this seems to be somewhat to be the "expected behaviour"?!

The appropriate report (last 24hrs) shows no "negative values", however, the timing still seems to be incorrect, e.g. Event Duration "15m 34s+"
nagios.png
Any idea / advise?
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Availability report is showing negative values

Post by tgriep »

Actually, I was mistaken. The log entry

Code: Select all

[1497439695] SERVICE DOWNTIME ALERT: myhost;my-webapp;STARTED; Service has entered a period of scheduled downtime
I think is that system is just logging that the host is still in downtime after the daemon has restarted and is not re-starting it as it ended after 2 minutes like it should.
It is logging that the system is currently in downtime only.

There does seem to be a bug in the report, the last line in the Log Entries field, the Event Duration is incrementing every time the report is run.
If you keep running it a few times, with a few minutes between run times, it will increment.

Try and see if that is happening on your system and if it is, you will have to put in a bug report for it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked