Page 2 of 3
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 11:22 am
by scottwilkerson
Can we attempt to restart rrdcached and verify the pid updates
Finally, lets look st the journal files again and see if they are changing size
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 12:01 pm
by jlmoldan
Code: Select all
[root@ust-nagios1t ~]# ls -la /var/rrdtool/rrdcached
total 12
drwxr-xr-x 2 nagios nagios 4096 Jan 28 10:45 .
drwxr-xr-x 3 nagios nagios 4096 Jun 25 2014 ..
-rw-r--r-- 1 nagios users 6 Jan 28 10:45 rrdcached.pid
srw-rw---- 1 nagios nagios 0 Jan 28 10:45 rrdcached.sock
[root@ust-nagios1t ~]# date
Wed Jan 28 10:59:40 CST 2015
[root@ust-nagios1t ~]# cat rrdtool^C
[root@ust-nagios1t ~]# cat /var/rrdtool/rrdcached/rrdcached.pid
56166
[root@ust-nagios1t ~]# ps -ef |grep rrdcached
root 22995 18469 0 10:59 pts/1 00:00:00 grep rrdcached
nagios 56166 1 0 10:45 ? 00:00:00 /usr/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -s nagios -m 0660 -l unix:/var/rrdtool/rrdcached/rrdcached.sock -F -w 900 -z 90 -j /tmp/ -b /var/rrdtool/rrdcached -P FLUSH,PENDING
[root@ust-nagios1t ~]# service rrdcached restart
Stopping rrdcached: [ OK ]
Starting rrdcached: [ OK ]
[root@ust-nagios1t ~]# ls -la /var/rrdtool/rrdcached
total 12
drwxr-xr-x 2 nagios nagios 4096 Jan 28 11:00 .
drwxr-xr-x 3 nagios nagios 4096 Jun 25 2014 ..
-rw-r--r-- 1 nagios users 6 Jan 28 11:00 rrdcached.pid
srw-rw---- 1 nagios nagios 0 Jan 28 11:00 rrdcached.sock
You have new mail in /var/spool/mail/root
[root@ust-nagios1t ~]# cat /var/rrdtool/rrdcached/rrdcached.pid
25578
[root@ust-nagios1t ~]# ps -ef |grep rrdcached
nagios 25578 1 0 11:00 ? 00:00:00 /usr/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -s nagios -m 0660 -l unix:/var/rrdtool/rrdcached/rrdcached.sock -F -w 900 -z 90 -j /tmp/ -b /var/rrdtool/rrdcached -P FLUSH,PENDING
root 28527 18469 0 11:00 pts/1 00:00:00 grep rrdcached
[root@ust-nagios1t ~]# ll /tmp | grep journal
-rw-r--r-- 1 nagios users 0 Jan 28 11:00 rrd.journal.1422464416.635904
[root@ust-nagios1t ~]# ll /tmp | grep journal
-rw-r--r-- 1 nagios users 0 Jan 28 11:00 rrd.journal.1422464416.635904
[root@ust-nagios1t ~]#
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 6:07 pm
by abrist
Lets see if perfdata is actually spooling. You should see the integer output from the following watch command change over time. It should go up and down.
Code: Select all
watch 'ls /usr/local/nagios/var/spool/perfdata | wc -l'
If it is not changing, there is an issue spooling the perfdata and will require further troubleshooting and earlier parts of the perfdata workflow. If the number is indeed changing, then perfdata is getting reaped and we need check the rrds for newly inserted data. Choose any service that you know is returning recent properly formatted performance data and then run the following command (replacing <host> and <service> with the correct values for the service in question:
Code: Select all
rrdtool dump /usr/local/nagios/var/perfdata/<host>/<service>.rrd | tail -25
If the values output are all NaN, you may have an issue with the rrd updates and then you post the output of the logs once again:
Code: Select all
tail -25 /usr/local/nagios/var/npcd.log
tail -25 /usr/local/nagios/var/perdata.log
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 8:57 pm
by jlmoldan
Step 1, the integer on the watch does change. Mostly 2 or 4 or 0, but it does change.
Step 2, all NaNs for them
[root@ust-nagios1p ~]# rrdtool dump /usr/local/nagios/share/perfdata/zeus.stthomas.edu/Load.rrd | tail -25
<!-- 2015-01-21 06:00:00 CST / 1421841600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-21 12:00:00 CST / 1421863200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-21 18:00:00 CST / 1421884800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 00:00:00 CST / 1421906400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 06:00:00 CST / 1421928000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 12:00:00 CST / 1421949600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-22 18:00:00 CST / 1421971200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 00:00:00 CST / 1421992800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 06:00:00 CST / 1422014400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 12:00:00 CST / 1422036000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-23 18:00:00 CST / 1422057600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 00:00:00 CST / 1422079200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 06:00:00 CST / 1422100800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 12:00:00 CST / 1422122400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-24 18:00:00 CST / 1422144000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 00:00:00 CST / 1422165600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 06:00:00 CST / 1422187200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 12:00:00 CST / 1422208800 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-25 18:00:00 CST / 1422230400 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 00:00:00 CST / 1422252000 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 06:00:00 CST / 1422273600 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
<!-- 2015-01-26 12:00:00 CST / 1422295200 --> <row><v> NaN </v><v> NaN </v><v> NaN </v></row>
</database>
</rra>
</rrd>
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 8:58 pm
by jlmoldan
Step 3,
[root@ust-nagios1p ~]# tail -25 /usr/local/nagios/var/npcd.log
[01-28-2015 19:56:56] NPCD: Regular File: 1422496609.perfdata.service
[01-28-2015 19:56:56] NPCD: A thread was started on thread_counter = 3
[01-28-2015 19:56:56] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[01-28-2015 19:56:56] NPCD: Processing file 1422496609.perfdata.service with ID 140079351723776 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496609.perfdata.service
[01-28-2015 19:56:56] NPCD: Processing file '1422496609.perfdata.service'
[01-28-2015 19:56:59] NPCD: No more files to process... waiting for 15 seconds
[01-28-2015 19:57:14] NPCD: Found 4 files in /usr/local/nagios/var/spool/perfdata/
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is .
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is ..
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 0/5 File is 1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: Regular File: 1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: A thread was started on thread_counter = 0
[01-28-2015 19:57:14] NPCD: DEBUG: load 1.030000/20.000000
[01-28-2015 19:57:14] NPCD: ThreadCounter 1/5 File is 1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Regular File: 1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Processing file 1422496624.perfdata.host with ID 140079593539328 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496624.perfdata.host
[01-28-2015 19:57:14] NPCD: A thread was started on thread_counter = 1
[01-28-2015 19:57:14] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[01-28-2015 19:57:14] NPCD: Processing file '1422496624.perfdata.host'
[01-28-2015 19:57:14] NPCD: Processing file 1422496624.perfdata.service with ID 140079583049472 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1422496624.perfdata.service
[01-28-2015 19:57:14] NPCD: Processing file '1422496624.perfdata.service'
[01-28-2015 19:57:16] NPCD: No more files to process... waiting for 15 seconds
[root@ust-nagios1p ~]# tail -25 /usr/local/nagios/var/perfdata.log
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_nrpe.php
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_nrpe (/usr/local/nagios/etc/pnp/check_commands/check_nrpe.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_nrpe.php
2015-01-28 19:57:32 [6015] [2] data2rrd called
2015-01-28 19:57:32 [6015] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2015-01-28 19:57:32 [6015] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/dbprod.stthomas.edu/CPU_Stats.rrd 1422496639:8:1:0:91:0.0:0.0
2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
2015-01-28 19:57:32 [6015] [2] Processing Line 310
2015-01-28 19:57:32 [6015] [2] Datatype set to 'SERVICEPERFDATA'
2015-01-28 19:57:32 [6015] [1] Found Performance Data for entwebapp5t.stthomas.edu / RFS (time=0.002921s;;;0.000000 size=502B;;;0)
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_xi_service_http (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_http.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_xi_service_http.php
2015-01-28 19:57:32 [6015] [2] No Custom Template found for check_xi_service_http (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_http.cfg)
2015-01-28 19:57:32 [6015] [2] RRD Datatype is GAUGE
2015-01-28 19:57:32 [6015] [2] Template is check_xi_service_http.php
2015-01-28 19:57:32 [6015] [2] data2rrd called
2015-01-28 19:57:32 [6015] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2015-01-28 19:57:32 [6015] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/entwebapp5t.stthomas.edu/RFS.rrd 1422496639:0.002921:502
2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
2015-01-28 19:57:32 [6015] [1] 310 Lines processed
2015-01-28 19:57:32 [6015] [1] /usr/local/nagios/var/spool/perfdata//1422496639.perfdata.service-PID-6015 deleted
2015-01-28 19:57:32 [6015] [1] PNP exiting (runtime 0.456617s) ...
[root@ust-nagios1p ~]#
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 9:12 pm
by Box293
jlmoldan wrote:[root@ust-nagios1t ~]# rrdtool --version
RRDtool 1.3.8 Copyright 1997-2009 by Tobias Oetiker <
[email protected]>
jlmoldan wrote:2015-01-28 19:57:32 [6015] [1] rrdtool update returns 256
I'm pretty sure this is erroring out but I don't have a good answer as to why.
I believe rrdtool needs to be version 1.4.4 for all of this to work.
Following this document:
http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
You've probably run this before but perhaps for some reason rrdtool-1.4.4.tar.gz is not being downloaded/upgraded.
Line 114 in the xi-rrdcached.sh script is downloading this file:
wget
http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.4.4.tar.gz
If for some reason the server cannot download this file, download it and transfer it to the XI server into the same directory as the xi-rrdcached.sh script. Comment out line 114 and then re-run the script.
NOTE: the
xi-rrdcached.sh script requires internet access to complete successfully.
Re: Broke Charting again...
Posted: Wed Jan 28, 2015 10:47 pm
by jlmoldan
awesome, it looks like that may have fixed it. I'll have to do a more thorough test/validation tomorrow but I see data again on some of the charts. THANKS!
Re: Broke Charting again...
Posted: Thu Jan 29, 2015 9:03 am
by jlmoldan
Alirght, 75% there. Charting has started again is appears to be working in test and production for all servers/services. On our production instance of Nagios however I am not getting any charts yet for Switches and Routers (On the test environment that is working. )
So now off to troubleshooting networking charting - which I believe is a different set of troubleshooting right?
Re: Broke Charting again...
Posted: Thu Jan 29, 2015 10:47 am
by tgriep
Could you run the following and post the results?
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
ll /var/lib/mrtg/
service crond status
cat /etc/cron.d/mrtg
Re: Broke Charting again...
Posted: Thu Jan 29, 2015 11:49 am
by jlmoldan
BOOM. There it was. Lack of an mrtg cron. Dang it, that simple.
Thanks!!! When I speak of you, I will speak highly.