Page 1 of 1

Missing data in graph, but not in RRD file

Posted: Mon Mar 11, 2019 11:25 pm
by steliopappas
Hi guys

I have a few metrics on various devices which appear to flatline on bandwidth metrics for sites which I know are very busy. See attached file.

Yet;

I can see in the logs that there were no alerts for the host or any of its services -neither soft or hard.

I can see in the RRD files that data was collected between 1130 and 1240:

<!-- 2019-03-12 11:00:00 AEDT / 1552348800 --> <row><v>1.2398265197e+06</v><v>4.1112917288e+05</v></row>
<!-- 2019-03-12 11:05:00 AEDT / 1552349100 --> <row><v>7.6194192902e+05</v><v>4.0204441940e+05</v></row>
<!-- 2019-03-12 11:10:00 AEDT / 1552349400 --> <row><v>6.9885840581e+05</v><v>3.9511638017e+05</v></row>
<!-- 2019-03-12 11:15:00 AEDT / 1552349700 --> <row><v>7.7762195580e+05</v><v>4.0907030249e+05</v></row>
<!-- 2019-03-12 11:20:00 AEDT / 1552350000 --> <row><v>9.8319642992e+05</v><v>4.3611351321e+05</v></row>
<!-- 2019-03-12 11:25:00 AEDT / 1552350300 --> <row><v>8.7911808872e+05</v><v>4.1632360036e+05</v></row>
<!-- 2019-03-12 11:30:00 AEDT / 1552350600 --> <row><v>7.9436729022e+05</v><v>4.0140363477e+05</v></row>
<!-- 2019-03-12 11:35:00 AEDT / 1552350900 --> <row><v>7.7679226776e+05</v><v>3.6681294279e+05</v></row>
<!-- 2019-03-12 11:40:00 AEDT / 1552351200 --> <row><v>7.9141455103e+05</v><v>3.0769344657e+05</v></row>
<!-- 2019-03-12 11:45:00 AEDT / 1552351500 --> <row><v>7.8583399771e+05</v><v>3.3321798670e+05</v></row>
<!-- 2019-03-12 11:50:00 AEDT / 1552351800 --> <row><v>7.4663968405e+05</v><v>4.1539744069e+05</v></row>
<!-- 2019-03-12 11:55:00 AEDT / 1552352100 --> <row><v>7.1444082883e+05</v><v>4.3066283573e+05</v></row>
<!-- 2019-03-12 12:00:00 AEDT / 1552352400 --> <row><v>7.3323758221e+05</v><v>4.3419256227e+05</v></row>
<!-- 2019-03-12 12:05:00 AEDT / 1552352700 --> <row><v>8.0664616235e+05</v><v>4.3236378968e+05</v></row>
<!-- 2019-03-12 12:10:00 AEDT / 1552353000 --> <row><v>8.1965119271e+05</v><v>4.9922621852e+05</v></row>
<!-- 2019-03-12 12:15:00 AEDT / 1552353300 --> <row><v>1.1655828116e+06</v><v>4.6916202725e+05</v></row>
<!-- 2019-03-12 12:20:00 AEDT / 1552353600 --> <row><v>9.7416678840e+05</v><v>4.7729565887e+05</v></row>
<!-- 2019-03-12 12:25:00 AEDT / 1552353900 --> <row><v>7.9438118690e+05</v><v>4.0431589512e+05</v></row>
<!-- 2019-03-12 12:30:00 AEDT / 1552354200 --> <row><v>6.2331786459e+05</v><v>3.7357329595e+05</v></row>
<!-- 2019-03-12 12:35:00 AEDT / 1552354500 --> <row><v>9.8577805092e+05</v><v>3.5353273956e+05</v></row>
<!-- 2019-03-12 12:40:00 AEDT / 1552354800 --> <row><v>1.0148949842e+06</v><v>3.5310831593e+05</v></row>
<!-- 2019-03-12 12:45:00 AEDT / 1552355100 --> <row><v>1.0119677374e+06</v><v>3.7949896026e+05</v></row>
<!-- 2019-03-12 12:50:00 AEDT / 1552355400 --> <row><v>9.7848669939e+05</v><v>4.8221537319e+05</v></row>
<!-- 2019-03-12 12:55:00 AEDT / 1552355700 --> <row><v>8.9367299607e+05</v><v>4.4694902613e+05</v></row>
<!-- 2019-03-12 13:00:00 AEDT / 1552356000 --> <row><v>9.2221729876e+05</v><v>4.3149172719e+05</v></row>

For some reason though, the graphs show zero during this time.

Do you have any ideas?

Is there a way to force a rebuild of the graphs?

Thanks in advance.
Stel

Re: Missing data in graph, but not in RRD file

Posted: Tue Mar 12, 2019 10:56 am
by cdienger
Can you attach the rrd and xml file for this service? I'd like to take a closer look at the data in both.

Re: Missing data in graph, but not in RRD file

Posted: Tue Mar 12, 2019 6:42 pm
by steliopappas
Thanks cdienger.

Until now, I had been looking at:

/var/lib/mrtg/10.29.133.58_4.rrd
/etc/mrtg/conf.d/10.29.133.58.cfg

I didn't know about the other files until I started looking for the xml file. While looking for it, I found:

/usr/local/nagios/share/perfdata/10.29.133.58/_HOST_.rrd
/usr/local/nagios/share/perfdata/10.29.133.58/_HOST_.xml

I've attached the first two files in this post. It seems there is a limit of three files to a post, so I'll make another post with the next two files.

Thanks
Stel

Re: Missing data in graph, but not in RRD file

Posted: Tue Mar 12, 2019 6:43 pm
by steliopappas
Here are the next two files as promised.

Re: Missing data in graph, but not in RRD file

Posted: Wed Mar 13, 2019 11:28 am
by cdienger
In /usr/local/nagios/share/perfdata/10.29.133.58/ or possibly /usr/local/nagios/share/perfdata/boraptmesc01/, do you have a rrd file named Internet_Bandwidth.rrd ? This is the file that would have graphing data.

Re: Missing data in graph, but not in RRD file

Posted: Wed Mar 13, 2019 4:34 pm
by steliopappas
Yes. I found the following:

Code: Select all

ls -al /usr/local/nagios/share/perfdata/boraptmesc01/
total 3116
drwxrwxr-x    2 nagios nagios    4096 Mar 14 08:11 .
drwxrwxr-x 2103 nagios nagios   61440 Mar 11 17:08 ..
-rw-rw-r--    1 nagios nagios 1534768 Mar 14 08:11 _HOST_.rrd
-rw-rw-r--    1 nagios nagios    3998 Mar 14 08:11 _HOST_.xml
-rw-rw-r--    1 nagios nagios  768224 Mar 14 08:09 Internet_Bandwidth.rrd
-rw-rw-r--    1 nagios nagios    2958 Mar 14 08:09 Internet_Bandwidth.xml
-rw-rw-r--    1 nagios nagios  768224 Mar 14 08:10 MPLS_Bandwidth.rrd
-rw-rw-r--    1 nagios nagios    2830 Mar 14 08:10 MPLS_Bandwidth.xml
I had a quick peek inside Internet_Bandwidth.rrd and found the zero'd data. That explains the corresponding flatline in the graph, but it now leads me to a number of other questions.

1. Why are there multiple locations for the rrd files?
2. Why do we appear to be doubling up on the data?
3. Why would one location have data, while the other does not?

Thanks in advance
Stel

Re: Missing data in graph, but not in RRD file

Posted: Thu Mar 14, 2019 10:53 am
by cdienger
The bandwidth check actually just queries the rrd file created by mrtg and then stores it in another rrd file that is then used for creating the graph. https://support.nagios.com/kb/article/n ... re-62.html has some more details about how this works. As for why one would show data and the other doesn't, it would appear that the check failed to run during that time and update the second rrd file for some reason. /usr/local/nagios/var/perfdata.log and npcd.log may have some clues as to why.