Page 1 of 1
Inaccurate Performance Graph
Posted: Tue Apr 21, 2020 11:57 am
by conston_rd
Hi,
We are running nagiosxi 5.6.6 on centOS 7.
core version is 4.4.3.
we are using NCPA agent collecting metrics through active chcecks.
Today there was a server outage, because of filesystem full. The server was down for more than 20 minutes.
I see the "servicecheck timeout" errors in nagios event log.
Interestingly the performance graph shows performance data even for the host down period, which is not correct.
we would like to get this fixed, could you please check and help to resolve this issue.
rrd dump on the performance data shows data for teh host down period.
I have attached rrd file, nagios.log extract for the specific server
also attached the screen grab of the performance graph.
Re: Inaccurate Performance Graph
Posted: Tue Apr 21, 2020 4:17 pm
by benjaminsmith
Hello,
Thank you for posting the screenshot and Nagios log, very helpful. When the server is down there is no plugin data being generated, so the RRD tool that creates the graphs is plotting the line from one data point to next.
Code: Select all
[1587452400] CURRENT SERVICE STATE: p1l00462g;Root Volume;OK;HARD;1;OK: Used_percent was 78.00 %
[1587462217] SERVICE ALERT: p1l00462g;Root Volume;WARNING;SOFT;1;WARNING: Used_percent was 84.30 %
[1587462337] SERVICE ALERT: p1l00462g;Root Volume;WARNING;SOFT;2;WARNING: Used_percent was 84.30 %
[1587462395] SERVICE ALERT: p1l00462g;Root Volume;WARNING;HARD;3;WARNING: Used_percent was 84.30 %
[1587464056] SERVICE NOTIFICATION: constond;p1l00462g;Root Volume;WARNING;notify-service-by-email;WARNING: Used_percent was 84.30 %
[1587464056] SERVICE NOTIFICATION: servicenow_integration;p1l00462g;Root Volume;WARNING;notify_servicenow_service;WARNING: Used_percent was 84.30 %
[1587465950] SERVICE NOTIFICATION: constond;p1l00462g;Root Volume;OK;notify-service-by-email;OK: Used_percent was 63.90 %
[1587465950] SERVICE NOTIFICATION: servicenow_integration;p1l00462g;Root Volume;OK;notify_servicenow_service;OK: Used_percent was 63.90 %
[1587465950] SERVICE ALERT: p1l00462g;Root Volume;OK;HARD;3;OK: Used_percent was 63.90 %
Re: Inaccurate Performance Graph
Posted: Wed Apr 22, 2020 1:37 am
by conston_rd
Thank you for the response.
so the RRD tool that creates the graphs is plotting the line from one data point to next --> but why do i see data in rrd file ? when there is no data received from plugin, rrd file should not have the data , is my understanding correct?
won't this result in inaccurate graph data?
is there a way to know the intervals when there was no data collected ?
Regards,
Conston
Re: Inaccurate Performance Graph
Posted: Wed Apr 22, 2020 10:00 am
by benjaminsmith
Hi Conston,
That is correct, RRDtool is filling in the missing data points in order to plot the performance graph. Currently, there is not an option to change this, however, we are planning to make significant improvements to the performance graphs in Nagios XI 6.
See:
https://www.nagios.com/roadmaps/