check_interval and RRD graphing issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

check_interval and RRD graphing issue

Post by Pitone_Maledetto »

Ciao Nagios people! Pitone Maledetto checking in again.
I have a growing Nagios Core (4.2.1) installation and by the time I will add all the hundreds of servers that we manage it will be huge! Thank you! :shock: :D

Now, cut to the chase. I have posted the same issue to the Nagiosgraph people but I feel it is something to do with Nagios or at least Nagios could be the partial culprit.

"Hi all, after changing Nagios check interval from 5 minutes to 15 minutes for my load checks Nagiosgraph has stopped writing perf data into the relevant graphs. The same happened for the Windows disk checks when risen to 60 minutes. Every other checks where this value has not being changed are working fine. I double checked every configuration although I have not changed anything. And nothing in the log to point to a particular problem. The RRD files are being updated (time stamps changes), Nagios performs the checks and alerts as normal, it is just that the perf data are not being translated or "transported" into the graphs.
Any help to point me to a solution would be greatly appreciated.
Thanks you."

The above is what I have asked over to the Nagiosgraph help forum.

When I say that the RRD file has been updated that is not really what's happening since the file is 'touched' but there is not data written into it. I double checked by

Code: Select all

rrdtool dump Load___load15.rrd  > /home/myhome/load.xml
and found a gap exaclty when I changed the check_interval for the Load service in question:

Code: Select all

            <!-- 2017-02-13 13:10:00 GMT / 1486991400 --> <row><v>2.1212666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:15:00 GMT / 1486991700 --> <row><v>2.1817000000e+00</v><v>1.8314666667e+01</v><v>2.1514666667e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:20:00 GMT / 1486992000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:25:00 GMT / 1486992300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
hence:

Image

The same with the above mentioned Windows Disk Checks.
Now I have commented out the new check_interval (is normal_check_interval deprecated?) and the RRD files and Graph are populated again.
Could anyone shed some light on the issue? It would be nice to use different check intervals for different services but if I don't get valuable graphs then I have to live with the default set-up.
Thank you for your time and help.
Ciao
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: check_interval and RRD graphing issue

Post by dwhitfield »

If you wait until 15 minutes, does it populate? It seems to me the check hasn't actually run in the log you sent. Considering the time you took to post both here and there, I'm guessing not, but I just want to be sure since you haven't given us enough of the log.

Are you using rrdcached? It might be best to disable it, at least temporarily. You can just comment out RRD_DAEMON_OPTS in /usr/local/nagios/etc/pnp/process_perfdata.cfg.
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: check_interval and RRD graphing issue

Post by Pitone_Maledetto »

HI dwhitfield,
Thank you for the reply.
I am using nagiosgraph and I think the /pnp/ directory is used in pnp4nagios since I don't have that location in my installation.
I have tried to run find against process_perfdata.cfg but I don't have that configuration file either,

I am posting a bigger chunk of the RRD file log and I have noticed that the intervals are spaced 5 minutes apart even though I changed the configuration of the command to run every 15 minutes. That could be a lead to the issue here.
The entry after the change should be every 15 minutes not 5, have a look:

Code: Select all

<!-- 2017-02-13 12:10:00 GMT / 1486987800 --> <row><v>1.9968000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:15:00 GMT / 1486988100 --> <row><v>2.1385000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:20:00 GMT / 1486988400 --> <row><v>2.0715000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:25:00 GMT / 1486988700 --> <row><v>2.0878666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:30:00 GMT / 1486989000 --> <row><v>2.0087333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:35:00 GMT / 1486989300 --> <row><v>2.0785000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:40:00 GMT / 1486989600 --> <row><v>1.9536333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:45:00 GMT / 1486989900 --> <row><v>1.9840333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:50:00 GMT / 1486990200 --> <row><v>1.9187333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:55:00 GMT / 1486990500 --> <row><v>1.9523333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:00:00 GMT / 1486990800 --> <row><v>1.9004333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:05:00 GMT / 1486991100 --> <row><v>1.9785000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:10:00 GMT / 1486991400 --> <row><v>2.1212666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:15:00 GMT / 1486991700 --> <row><v>2.1817000000e+00</v><v>1.8314666667e+01</v><v>2.1514666667e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:20:00 GMT / 1486992000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:25:00 GMT / 1486992300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:30:00 GMT / 1486992600 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:35:00 GMT / 1486992900 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:40:00 GMT / 1486993200 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:45:00 GMT / 1486993500 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:50:00 GMT / 1486993800 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:55:00 GMT / 1486994100 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:00:00 GMT / 1486994400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:05:00 GMT / 1486994700 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:10:00 GMT / 1486995000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:15:00 GMT / 1486995300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:20:00 GMT / 1486995600 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:25:00 GMT / 1486995900 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:30:00 GMT / 1486996200 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:35:00 GMT / 1486996500 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:40:00 GMT / 1486996800 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:45:00 GMT / 1486997100 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:50:00 GMT / 1486997400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:55:00 GMT / 1486997700 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
While did not graph Nagios actually reported some Warning on Load so I am pretty sure that Nagios was performing the check, it is just the RRD file not getting updated.
After reverting to 5 minutes default all went back to normal as you can see from the graph below:

Image

Is there any other information that I can provide?
Thank you for your time.
Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: check_interval and RRD graphing issue

Post by tgriep »

Take a look at this link, it sounds exactly like the issue you are having.
https://sourceforge.net/p/nagiosgraph/w ... spotty-why
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: check_interval and RRD graphing issue

Post by Pitone_Maledetto »

Hi tgriep,
yes it looks very well to be the cause of the issue here.
I will try if I can to change the parameters later on today.
It will be interesting to see if I can set the various timings to accommodate several check_interval peiods.
Thank you for your time.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: check_interval and RRD graphing issue

Post by dwhitfield »

Apologies for the pnp/nagiosgraph issue, although my point about the logs and rrdcached still stood. :) Thanks for posting more log output!

Let us know if that nagiosgraph wiki doesn't end up working for you.
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: check_interval and RRD graphing issue

Post by Pitone_Maledetto »

Hi guys,
thank you for your help but:
"Note that the stepsize and heartbeat are set when an RRD file is created. If you change the stepsize and/or heartbeat, you must either delete the corresponding RRD file(s) so that nagiosgraph can create a new one with the new stepsize/heartbeat, or manually modify the stepsize and/or heartbeat in the RRD files(s) by doing a dump/edit/restore."

I have too much data to lose if I delete all the relevant RRD files and dump/edit/restore is only feasible if you have some to do and I have too many.
So I decided to live with the 5 minutes interval and keep this in mind for my next installation.

Thank you very much again.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: check_interval and RRD graphing issue

Post by dwhitfield »

You could always just move that historical data to an archive graph. You don't actually need to delete them. That said, it's still a pain to deal with two sets of graphs.

That said, are we ready to lock this up?
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: check_interval and RRD graphing issue

Post by Pitone_Maledetto »

Dear dwhitfield,
yes please and thank you to all again.
"Nanu Nanu"
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
Locked