Page 1 of 1

2012R1.3 Update Stops MRTG

Posted: Sat Feb 02, 2013 3:32 pm
by mikew
After an update to 2012R1.3 MRTG has stopped updating files in /var/lib/mrtg. Since the 2012R1.3 update the machine has been updated to the latest version.

The mrtg.ok file is listed as the others at Jan 20. There are 3212 files of Jan. 20 and 9 files updated to Feb. 2.

Code: Select all

-rw-r--r--. 1 root root      0 Jan 20 09:52 mrtg.ok
Cron job for mrtg is working.

Of the bandwidth checks 900 are saying the rrd file does not exist. Of the ones that do exist they are all saying 0 bandwidth. Perfdata files in /usr/local/nagios/share/perfdata are being updated.
rrdcached has been turned off for trouble shooting with no joy.

Here is the npcd.log

Code: Select all

[02-02-2013 14:37:12] NPCD: Regular File: 1359833819.perfdata.host
[02-02-2013 14:37:12] NPCD: A thread was started on thread_counter = 0
[02-02-2013 14:37:12] NPCD: DEBUG: load 18.480000/40.000000
[02-02-2013 14:37:12] NPCD: ThreadCounter 1/4 File is 1359833819.perfdata.service
[02-02-2013 14:37:12] NPCD: Regular File: 1359833819.perfdata.service
[02-02-2013 14:37:12] NPCD: A thread was started on thread_counter = 1
[02-02-2013 14:37:12] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[02-02-2013 14:37:12] NPCD: Processing file 1359833819.perfdata.service with ID 140440491202304 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1359833819.perfdata.service
[02-02-2013 14:37:12] NPCD: Processing file '1359833819.perfdata.service'
[02-02-2013 14:37:12] NPCD: Processing file 1359833819.perfdata.host with ID 140440501692160 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1359833819.perfdata.host
[02-02-2013 14:37:12] NPCD: Processing file '1359833819.perfdata.host'
This is perfdata.log
rrdtool-perl was installed to eliminate the RRDs issue below.

Code: Select all

2013-02-02 14:09:35 [15735] [2] No Custom Template found for check_xi_service_snmp_win_storage (/usr/local/nagios/etc/pnp/check_commands/check_xi_service_snmp_win_storage.cfg)
2013-02-02 14:09:35 [15735] [2] Template is check_xi_service_snmp_win_storage.php
2013-02-02 14:09:35 [15735] [2] data2rrd called
2013-02-02 14:09:35 [15735] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-02-02 14:09:35 [15735] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/example.com/Virtual_Memory_Usage.rrd 1359832157:1316
2013-02-02 14:09:35 [15735] [1] rrdtool update returns 0
2013-02-02 14:09:35 [15735] [2] Processing Line 423
2013-02-02 14:09:35 [15735] [2] No Perfdata. Skipping line 423
2013-02-02 14:09:35 [15735] [2] Processing Line 424
2013-02-02 14:09:35 [15735] [2] No Perfdata. Skipping line 424
2013-02-02 14:09:35 [15735] [2] Processing Line 425
2013-02-02 14:09:35 [15735] [2] No Perfdata. Skipping line 425
2013-02-02 14:09:35 [15735] [1] 425 lines processed
2013-02-02 14:09:35 [15735] [1] /usr/local/nagios/var/spool/perfdata//1359832158.perfdata.service-PID-15735 deleted
2013-02-02 14:09:35 [15735] [1] PNP exiting (runtime 2.53693s) ...
2013-02-02 14:15:10 [16798] [0] *** TIMEOUT: Timeout after 10 secs. ***
2013-02-02 14:15:10 [16798] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-02 14:15:10 [16798] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-02 14:15:10 [16798] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1359832484.perfdata.service-PID-16798 deleted
2013-02-02 14:15:10 [16798] [0] *** Timeout while processing Host: "example.com" Service: "FastEthernet0_40_Bandwidth"
2013-02-02 14:15:10 [16798] [0] *** process_perfdata.pl terminated on signal ALRM

Profile

Nagios XI Installation Profile
Download Profile
System:
Nagios XI Version : 2012R1.5b
10.107.7.111 2.6.32-279.el6.x86_64 x86_64
CentOS release 6.3 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0
Server Name: nagiosxi
Server Address: 10.107.3.125
Server Port: 80
Date/Time
PHP Timezone: America/Detroit
PHP Time: Sat, 02 Feb 2013 14:40:05 -0500
System Time: Sat, 02 Feb 2013 14:40:05 -0500
Nagios XI Data
nagios (pid 61367) is running...
NPCD running (pid 29960).
ndo2db (pid 44150) is running...
CPU Load 15: 8.42
Total Hosts: 658
Total Services: 7324
Function 'get_base_uri' returns: http://192.168.1.1/nagiosxi/
Function 'get_base_url' returns: http://192.168.1.1/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://192.168.1.1/nagiosxi/includes/co ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.058 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.047 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.047/0.054/0.059/0.010 ms
Test wget To locahost
WGET From URL: http://localhost/nagiosql/index.php
Running:

/usr/bin/wget http://localhost/nagiosql/index.php

--2013-02-02 14:40:07-- http://localhost/nagiosql/index.php
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5259 (5.1K) [text/html]
Saving to: `/tmp/nagiosql_index.tmp'

0K ..... 100% 260M=0s

2013-02-02 14:40:07 (260 MB/s) - `/tmp/nagiosql_index.tmp' saved [5259/5259]

Re: 2012R1.3 Update Stops MRTG

Posted: Sat Feb 02, 2013 5:46 pm
by mikew
I found the problem. Once the update was complete there was an additional issue where the /var/lock/mrtg directory was filled with lock files, thousands of them. Though it did not show in logs the lock files stopped the system from being able to process MRTG. Once the lock files were removed the system began to recover. This system was under a heavy load at the time of the update so that certainly was more of the issue than the update.

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 7:32 am
by mikew
The problem has happened again. Two issues:

1. All but 15 out of 4500 rrd files are NOT getting updated
2. All Bandwidth is 0

I have turned off all performance enhancements for rrdcached and npcd and that did not make a difference.

It looks like lock files are out of date and not getting removed. The date on the CentOS server is up to date, using NTP.

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
2013-02-04 07:33:48: ERROR: I guess another mrtg is running. A lockfile (/var/lock/mrtg/mrtg_l) aged
227 seconds is hanging around. If you are sure that no other mrtg
is running you can remove the lockfile

If I removed lock files and run the command here is the output....many lines like this:
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok

2013-02-04 00:07:26: ERROR: Target[192.168.5.1][_OUT_] ' $target->[3089]{$mode} ' did not eval into defined data

Lock files appear again.

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 8:52 am
by scottwilkerson
Mike, does the lock file appear and then not go away on subsequent runs?

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 8:56 am
by mikew
Yes every time lock files appear and are not removed

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 4:01 pm
by scottwilkerson
Does mrtg appear to continue to run?

Code: Select all

ps -ef|grep mrtg
Also, just curious, is this a large mrtg.cfg or just a small one?

Can we confirm what version of mrtg you are running on this machine

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 4:22 pm
by mikew
No it would not run. So I finally fixed it .....had to edit the 200,000 line mrtg file and remove errors that occurred. The key to finding the solution was running the mrtg command that cron uses and solving any issues that resulted. Anyway it is fixed and has been running good for several hours. Thanks.

Re: 2012R1.3 Update Stops MRTG

Posted: Mon Feb 04, 2013 5:14 pm
by scottwilkerson
Glad you got it!