Broke Charting again...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Broke Charting again...

Post by scottwilkerson »

Can we attempt to restart rrdcached and verify the pid updates

Code: Select all

service rrdcached restart
Finally, lets look st the journal files again and see if they are changing size

Code: Select all

ll /tmp | grep journal
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

Code: Select all

[root@ust-nagios1t ~]#  ls -la /var/rrdtool/rrdcached
total 12
drwxr-xr-x 2 nagios nagios 4096 Jan 28 10:45 .
drwxr-xr-x 3 nagios nagios 4096 Jun 25  2014 ..
-rw-r--r-- 1 nagios users     6 Jan 28 10:45 rrdcached.pid
srw-rw---- 1 nagios nagios    0 Jan 28 10:45 rrdcached.sock

[root@ust-nagios1t ~]# date
Wed Jan 28 10:59:40 CST 2015

[root@ust-nagios1t ~]# cat rrdtool^C
[root@ust-nagios1t ~]# cat /var/rrdtool/rrdcached/rrdcached.pid 
56166

[root@ust-nagios1t ~]# ps -ef |grep rrdcached
root     22995 18469  0 10:59 pts/1    00:00:00 grep rrdcached
nagios   56166     1  0 10:45 ?        00:00:00 /usr/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -s nagios -m 0660 -l unix:/var/rrdtool/rrdcached/rrdcached.sock -F -w 900 -z 90 -j /tmp/ -b /var/rrdtool/rrdcached -P FLUSH,PENDING

[root@ust-nagios1t ~]#     service rrdcached restart
Stopping rrdcached:                                        [  OK  ]
Starting rrdcached:                                        [  OK  ]

[root@ust-nagios1t ~]# ls -la /var/rrdtool/rrdcached
total 12
drwxr-xr-x 2 nagios nagios 4096 Jan 28 11:00 .
drwxr-xr-x 3 nagios nagios 4096 Jun 25  2014 ..
-rw-r--r-- 1 nagios users     6 Jan 28 11:00 rrdcached.pid
srw-rw---- 1 nagios nagios    0 Jan 28 11:00 rrdcached.sock
You have new mail in /var/spool/mail/root

[root@ust-nagios1t ~]# cat /var/rrdtool/rrdcached/rrdcached.pid
25578

[root@ust-nagios1t ~]# ps -ef |grep rrdcached
nagios   25578     1  0 11:00 ?        00:00:00 /usr/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -s nagios -m 0660 -l unix:/var/rrdtool/rrdcached/rrdcached.sock -F -w 900 -z 90 -j /tmp/ -b /var/rrdtool/rrdcached -P FLUSH,PENDING
root     28527 18469  0 11:00 pts/1    00:00:00 grep rrdcached

[root@ust-nagios1t ~]#     ll /tmp | grep journal
-rw-r--r--  1 nagios users           0 Jan 28 11:00 rrd.journal.1422464416.635904

[root@ust-nagios1t ~]#     ll /tmp | grep journal
-rw-r--r--  1 nagios users           0 Jan 28 11:00 rrd.journal.1422464416.635904

[root@ust-nagios1t ~]#
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Broke Charting again...

Post by abrist »

Lets see if perfdata is actually spooling. You should see the integer output from the following watch command change over time. It should go up and down.

Code: Select all

watch 'ls /usr/local/nagios/var/spool/perfdata | wc -l'
If it is not changing, there is an issue spooling the perfdata and will require further troubleshooting and earlier parts of the perfdata workflow. If the number is indeed changing, then perfdata is getting reaped and we need check the rrds for newly inserted data. Choose any service that you know is returning recent properly formatted performance data and then run the following command (replacing <host> and <service> with the correct values for the service in question:

Code: Select all

rrdtool dump /usr/local/nagios/var/perfdata/<host>/<service>.rrd | tail -25
If the values output are all NaN, you may have an issue with the rrd updates and then you post the output of the logs once again:

Code: Select all

tail -25 /usr/local/nagios/var/npcd.log
tail -25 /usr/local/nagios/var/perdata.log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

Step 1, the integer on the watch does change. Mostly 2 or 4 or 0, but it does change.

Step 2, all NaNs for them

[root@ust-nagios1p ~]# rrdtool dump /usr/local/nagios/share/perfdata/zeus.stthomas.edu/Load.rrd | tail -25
<!-- 2015-01-21 06:00:00 CST / 1421841600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-21 12:00:00 CST / 1421863200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-21 18:00:00 CST / 1421884800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 00:00:00 CST / 1421906400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 06:00:00 CST / 1421928000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 12:00:00 CST / 1421949600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 18:00:00 CST / 1421971200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 00:00:00 CST / 1421992800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 06:00:00 CST / 1422014400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 12:00:00 CST / 1422036000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 18:00:00 CST / 1422057600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 00:00:00 CST / 1422079200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 06:00:00 CST / 1422100800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 12:00:00 CST / 1422122400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 18:00:00 CST / 1422144000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 00:00:00 CST / 1422165600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 06:00:00 CST / 1422187200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 12:00:00 CST / 1422208800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 18:00:00 CST / 1422230400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 00:00:00 CST / 1422252000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 06:00:00 CST / 1422273600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 12:00:00 CST / 1422295200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
</database>
</rra>
</rrd>
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

Step 3,

[root@ust-nagios1p ~]# tail -25 /usr/local/nagios/var/npcd.log
[01-28-2015 19:56:56] NPCD: Regular File: 1422496609.perfdata.service
[01-28-2015 19:56:56] NPCD: A thread was started on thread_counter = 3
[01-28-2015 19:56:56] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[01-28-2015 19:56:56] NPCD: Processing file 1422496609.perfdata.service with ID 140079351723776 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496609.perfdata.service
[01-28-2015 19:56:56] NPCD: Processing file '1422496609.perfdata.service'
[01-28-2015 19:56:59] NPCD: No more files to process... waiting for 15 seconds
[01-28-2015 19:57:14] NPCD: Found 4 files in /usr/local/nagios/var/spool/perfdata/
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is .
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is ..
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is 1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: Regular File: 1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: A thread was started on thread_counter = 0
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 1/5 File is 1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Regular File: 1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Processing file 1422496624.perfdata.host with ID 140079593539328 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: A thread was started on thread_counter = 1
[01-28-2015 19:57:14] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[01-28-2015 19:57:14] NPCD: Processing file '1422496624.perfdata.host'
[01-28-2015 19:57:14] NPCD: Processing file 1422496624.perfdata.service with ID 140079583049472 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Processing file '1422496624.perfdata.service'
[01-28-2015 19:57:16] NPCD: No more files to process... waiting for 15 seconds

[root@ust-nagios1p ~]# tail -25 /usr/local/nagios/var/perfdata.log
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_nrpe.php
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_nrpe.php
2015-01-28 19:57:32 [6015] [2] data2rrd called
2015-01-28 19:57:32 [6015] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2015-01-28 19:57:32 [6015] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/dbprod.stthomas.edu/CPU_Stats.rrd 1422496639:8:1:0:91:0.0:0.0
2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
2015-01-28 19:57:32 [6015] [2] Processing Line 310
2015-01-28 19:57:32 [6015] [2] Datatype set to 'SERVICEPERFDATA'
2015-01-28 19:57:32 [6015] [1] Found Performance Data for entwebapp5t.stthomas.edu / RFS (time=0.002921s;;;0.000000 size=502B;;;0)
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_xi_service_http (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_http.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_xi_service_http.php
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_xi_service_http (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_http.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_xi_service_http.php
2015-01-28 19:57:32 [6015] [2] data2rrd called
2015-01-28 19:57:32 [6015] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2015-01-28 19:57:32 [6015] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/entwebapp5t.stthomas.edu/RFS.rrd 1422496639:0.002921:502
2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
2015-01-28 19:57:32 [6015] [1] 310 Lines processed
2015-01-28 19:57:32 [6015] [1] /usr/local/nagios/var/spool/perfdata//1422496639.perfdata.service-PID-6015 deleted
2015-01-28 19:57:32 [6015] [1] PNP exiting (runtime 0.456617s) ...
[root@ust-nagios1p ~]#
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Broke Charting again...

Post by Box293 »

jlmoldan wrote:[root@ust-nagios1t ~]# rrdtool --version
RRDtool 1.3.8 Copyright 1997-2009 by Tobias Oetiker <[email protected]>
jlmoldan wrote:2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
I'm pretty sure this is erroring out but I don't have a good answer as to why.


I believe rrdtool needs to be version 1.4.4 for all of this to work.

Following this document:
http://assets.nagios.com/downloads/nagi ... ios_XI.pdf

You've probably run this before but perhaps for some reason rrdtool-1.4.4.tar.gz is not being downloaded/upgraded.

Line 114 in the xi-rrdcached.sh script is downloading this file:
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.4.4.tar.gz

If for some reason the server cannot download this file, download it and transfer it to the XI server into the same directory as the xi-rrdcached.sh script. Comment out line 114 and then re-run the script.

NOTE: the xi-rrdcached.sh script requires internet access to complete successfully.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

awesome, it looks like that may have fixed it. I'll have to do a more thorough test/validation tomorrow but I see data again on some of the charts. THANKS!
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

Alirght, 75% there. Charting has started again is appears to be working in test and production for all servers/services. On our production instance of Nagios however I am not getting any charts yet for Switches and Routers (On the test environment that is working. )

So now off to troubleshooting networking charting - which I believe is a different set of troubleshooting right?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Broke Charting again...

Post by tgriep »

Could you run the following and post the results?

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
ll /var/lib/mrtg/
service crond status
cat /etc/cron.d/mrtg
Be sure to check out our Knowledgebase for helpful articles and solutions!
jlmoldan
Posts: 27
Joined: Mon Aug 05, 2013 2:32 pm

Re: Broke Charting again...

Post by jlmoldan »

BOOM. There it was. Lack of an mrtg cron. Dang it, that simple.

Thanks!!! When I speak of you, I will speak highly.
Locked