check_interval and RRD graphing issue

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

check_interval and RRD graphing issue

Postby Pitone_Maledetto » Tue Feb 14, 2017 3:00 am

Ciao Nagios people! Pitone Maledetto checking in again.
I have a growing Nagios Core (4.2.1) installation and by the time I will add all the hundreds of servers that we manage it will be huge! Thank you! :shock: :D

Now, cut to the chase. I have posted the same issue to the Nagiosgraph people but I feel it is something to do with Nagios or at least Nagios could be the partial culprit.

"Hi all, after changing Nagios check interval from 5 minutes to 15 minutes for my load checks Nagiosgraph has stopped writing perf data into the relevant graphs. The same happened for the Windows disk checks when risen to 60 minutes. Every other checks where this value has not being changed are working fine. I double checked every configuration although I have not changed anything. And nothing in the log to point to a particular problem. The RRD files are being updated (time stamps changes), Nagios performs the checks and alerts as normal, it is just that the perf data are not being translated or "transported" into the graphs.
Any help to point me to a solution would be greatly appreciated.
Thanks you."

The above is what I have asked over to the Nagiosgraph help forum.

When I say that the RRD file has been updated that is not really what's happening since the file is 'touched' but there is not data written into it. I double checked by
Code: Select all
rrdtool dump Load___load15.rrd  > /home/myhome/load.xml
and found a gap exaclty when I changed the check_interval for the Load service in question:

Code: Select all
            <!-- 2017-02-13 13:10:00 GMT / 1486991400 --> <row><v>2.1212666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:15:00 GMT / 1486991700 --> <row><v>2.1817000000e+00</v><v>1.8314666667e+01</v><v>2.1514666667e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:20:00 GMT / 1486992000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:25:00 GMT / 1486992300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>


hence:

Image

The same with the above mentioned Windows Disk Checks.
Now I have commented out the new check_interval (is normal_check_interval deprecated?) and the RRD files and Graph are populated again.
Could anyone shed some light on the issue? It would be nice to use different check intervals for different services but if I don't get valuable graphs then I have to live with the default set-up.
Thank you for your time and help.
Ciao
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 18
Joined: Fri Jul 01, 2016 4:11 am

Re: check_interval and RRD graphing issue

Postby dwhitfield » Tue Feb 14, 2017 12:03 pm

If you wait until 15 minutes, does it populate? It seems to me the check hasn't actually run in the log you sent. Considering the time you took to post both here and there, I'm guessing not, but I just want to be sure since you haven't given us enough of the log.

Are you using rrdcached? It might be best to disable it, at least temporarily. You can just comment out RRD_DAEMON_OPTS in /usr/local/nagios/etc/pnp/process_perfdata.cfg.
https://support.nagios.com/forum/viewtopic.php?f=6&t=1274 says XI at the top, but it more-or-less applies to everything.
User avatar
dwhitfield
Support Tech
 
Posts: 1845
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: check_interval and RRD graphing issue

Postby Pitone_Maledetto » Wed Feb 15, 2017 3:53 am

HI dwhitfield,
Thank you for the reply.
I am using nagiosgraph and I think the /pnp/ directory is used in pnp4nagios since I don't have that location in my installation.
I have tried to run find against process_perfdata.cfg but I don't have that configuration file either,

I am posting a bigger chunk of the RRD file log and I have noticed that the intervals are spaced 5 minutes apart even though I changed the configuration of the command to run every 15 minutes. That could be a lead to the issue here.
The entry after the change should be every 15 minutes not 5, have a look:

Code: Select all
<!-- 2017-02-13 12:10:00 GMT / 1486987800 --> <row><v>1.9968000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:15:00 GMT / 1486988100 --> <row><v>2.1385000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:20:00 GMT / 1486988400 --> <row><v>2.0715000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:25:00 GMT / 1486988700 --> <row><v>2.0878666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:30:00 GMT / 1486989000 --> <row><v>2.0087333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:35:00 GMT / 1486989300 --> <row><v>2.0785000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:40:00 GMT / 1486989600 --> <row><v>1.9536333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:45:00 GMT / 1486989900 --> <row><v>1.9840333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:50:00 GMT / 1486990200 --> <row><v>1.9187333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 12:55:00 GMT / 1486990500 --> <row><v>1.9523333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:00:00 GMT / 1486990800 --> <row><v>1.9004333333e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:05:00 GMT / 1486991100 --> <row><v>1.9785000000e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:10:00 GMT / 1486991400 --> <row><v>2.1212666667e+00</v><v>1.6000000000e+01</v><v>1.9200000000e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:15:00 GMT / 1486991700 --> <row><v>2.1817000000e+00</v><v>1.8314666667e+01</v><v>2.1514666667e+01</v><v>0.0000000000e+00</v></row>
            <!-- 2017-02-13 13:20:00 GMT / 1486992000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:25:00 GMT / 1486992300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:30:00 GMT / 1486992600 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:35:00 GMT / 1486992900 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:40:00 GMT / 1486993200 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:45:00 GMT / 1486993500 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:50:00 GMT / 1486993800 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 13:55:00 GMT / 1486994100 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:00:00 GMT / 1486994400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:05:00 GMT / 1486994700 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:10:00 GMT / 1486995000 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:15:00 GMT / 1486995300 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:20:00 GMT / 1486995600 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:25:00 GMT / 1486995900 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:30:00 GMT / 1486996200 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:35:00 GMT / 1486996500 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:40:00 GMT / 1486996800 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:45:00 GMT / 1486997100 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:50:00 GMT / 1486997400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
            <!-- 2017-02-13 14:55:00 GMT / 1486997700 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>


While did not graph Nagios actually reported some Warning on Load so I am pretty sure that Nagios was performing the check, it is just the RRD file not getting updated.
After reverting to 5 minutes default all went back to normal as you can see from the graph below:

Image

Is there any other information that I can provide?
Thank you for your time.
Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 18
Joined: Fri Jul 01, 2016 4:11 am

Re: check_interval and RRD graphing issue

Postby tgriep » Wed Feb 15, 2017 4:51 pm

Take a look at this link, it sounds exactly like the issue you are having.
https://sourceforge.net/p/nagiosgraph/wiki/Home/#my-graphs-are-fragmentedspotty-why
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4160
Joined: Thu Oct 30, 2014 9:02 am

Re: check_interval and RRD graphing issue

Postby Pitone_Maledetto » Thu Feb 16, 2017 7:45 am

Hi tgriep,
yes it looks very well to be the cause of the issue here.
I will try if I can to change the parameters later on today.
It will be interesting to see if I can set the various timings to accommodate several check_interval peiods.
Thank you for your time.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 18
Joined: Fri Jul 01, 2016 4:11 am

Re: check_interval and RRD graphing issue

Postby dwhitfield » Thu Feb 16, 2017 8:56 am

Apologies for the pnp/nagiosgraph issue, although my point about the logs and rrdcached still stood. :) Thanks for posting more log output!

Let us know if that nagiosgraph wiki doesn't end up working for you.
https://support.nagios.com/forum/viewtopic.php?f=6&t=1274 says XI at the top, but it more-or-less applies to everything.
User avatar
dwhitfield
Support Tech
 
Posts: 1845
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: check_interval and RRD graphing issue

Postby Pitone_Maledetto » Tue Feb 21, 2017 8:09 am

Hi guys,
thank you for your help but:
"Note that the stepsize and heartbeat are set when an RRD file is created. If you change the stepsize and/or heartbeat, you must either delete the corresponding RRD file(s) so that nagiosgraph can create a new one with the new stepsize/heartbeat, or manually modify the stepsize and/or heartbeat in the RRD files(s) by doing a dump/edit/restore."

I have too much data to lose if I delete all the relevant RRD files and dump/edit/restore is only feasible if you have some to do and I have too many.
So I decided to live with the 5 minutes interval and keep this in mind for my next installation.

Thank you very much again.
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 18
Joined: Fri Jul 01, 2016 4:11 am

Re: check_interval and RRD graphing issue

Postby dwhitfield » Tue Feb 21, 2017 10:01 am

You could always just move that historical data to an archive graph. You don't actually need to delete them. That said, it's still a pain to deal with two sets of graphs.

That said, are we ready to lock this up?
https://support.nagios.com/forum/viewtopic.php?f=6&t=1274 says XI at the top, but it more-or-less applies to everything.
User avatar
dwhitfield
Support Tech
 
Posts: 1845
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: check_interval and RRD graphing issue

Postby Pitone_Maledetto » Wed Feb 22, 2017 7:15 am

Dear dwhitfield,
yes please and thank you to all again.
"Nanu Nanu"
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 18
Joined: Fri Jul 01, 2016 4:11 am


Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 12 guests