Page 4 of 6

Re: missing bandwidth perf data for network devices

Posted: Thu Feb 25, 2016 7:08 pm
by bosecorp
here you go

I would agree, this is a MRTG issue. the issue is only with the bandwidth graphs

on the graph you can see very small gaps

Re: missing bandwidth perf data for network devices

Posted: Thu Feb 25, 2016 9:10 pm
by Box293
Can you let us know if doubling the amount of forks reduces the gaps.

Re: missing bandwidth perf data for network devices

Posted: Fri Feb 26, 2016 4:34 pm
by bosecorp
so you want me to increase it from 16 to 32

Re: missing bandwidth perf data for network devices

Posted: Sun Feb 28, 2016 7:06 pm
by Box293
Yes, lets see how that helps.

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 8:58 am
by bosecorp
made things worse. as soon I change it to 32 I lost the graphs

I put the setting back to 16

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 3:36 pm
by tgriep
Last thing we could try is to setup MRTG to run as a daemon and not from CRON. Would that be an option?

To to that, edit /etc/cron.d/mrtg and comment out the mrtg line and restart cron by running

Code: Select all

service crond restart
Then edit the /etc/mrtg/mrtg.cfg file and add the following line to it

Code: Select all

RunAsDaemon: Yes
The to run MRTG as a daemon, run this command

Code: Select all

LANG=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
Downside of this is you need to manually restart the daemon if you add more switches to be monitored.
Let us know if this works.

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 3:38 pm
by bosecorp
I don;t want to do that. I am adding devices all the time

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 5:01 pm
by Box293
During the period that the graphs have no data, are there correlating logs in npcd and perfdata say they are deleting data? This might need some load thresholds tuned.

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 7:41 pm
by bosecorp
nothing on the logs

i saw a gap at 12:26

Code: Select all

[03-03-2016 12:25:58] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[03-03-2016 12:25:58] NPCD: Processing file 'service-perfdata.1457025946'
[03-03-2016 12:26:00] NPCD: No more files to process... waiting for 15 seconds
[03-03-2016 12:26:15] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata
[03-03-2016 12:26:15] NPCD: DEBUG: load 3.210000/500.000000
[03-03-2016 12:26:15] NPCD: ThreadCounter 0/5 File is .
[03-03-2016 12:26:15] NPCD: DEBUG: load 3.210000/500.000000
[03-03-2016 12:26:15] NPCD: ThreadCounter 0/5 File is ..
[03-03-2016 12:26:15] NPCD: DEBUG: load 3.210000/500.000000
[03-03-2016 12:26:15] NPCD: ThreadCounter 0/5 File is host-perfdata.1457025960
[03-03-2016 12:26:15] NPCD: Regular File: host-perfdata.1457025960
[03-03-2016 12:26:15] NPCD: A thread was started on thread_counter = 0
[03-03-2016 12:26:15] NPCD: Processing file host-perfdata.1457025960 with ID 140665123284736 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/host-perfdata.1457025960
[03-03-2016 12:26:15] NPCD: DEBUG: load 3.210000/500.000000
[03-03-2016 12:26:15] NPCD: ThreadCounter 1/5 File is service-perfdata.1457025961
[03-03-2016 12:26:15] NPCD: Processing file 'host-perfdata.1457025960'
[03-03-2016 12:26:15] NPCD: Regular File: service-perfdata.1457025961
[03-03-2016 12:26:15] NPCD: Processing file service-perfdata.1457025961 with ID 140665102309120 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/service-perfdata.1457025961
[03-03-2016 12:26:15] NPCD: A thread was started on thread_counter = 1
[03-03-2016 12:26:15] NPCD: Processing file 'service-perfdata.1457025961'
[03-03-2016 12:26:15] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[03-03-2016 12:26:19] NPCD: No more files to process... waiting for 15 seconds
[03-03-2016 12:26:34] NPCD: Found 6 files in /var/nagiosramdisk/spool/perfdata
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: ThreadCounter 0/5 File is .
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: ThreadCounter 0/5 File is ..
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: ThreadCounter 0/5 File is host-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: Regular File: host-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: A thread was started on thread_counter = 0
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: ThreadCounter 1/5 File is host-perfdata.1457025990
[03-03-2016 12:26:34] NPCD: Processing file host-perfdata.1457025976 with ID 140665123284736 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/host-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: Regular File: host-perfdata.1457025990
[03-03-2016 12:26:34] NPCD: Processing file 'host-perfdata.1457025976'
[03-03-2016 12:26:34] NPCD: A thread was started on thread_counter = 1
[03-03-2016 12:26:34] NPCD: Processing file host-perfdata.1457025990 with ID 140665102309120 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/host-perfdata.1457025990
[03-03-2016 12:26:34] NPCD: Processing file 'host-perfdata.1457025990'
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: ThreadCounter 2/5 File is service-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: Regular File: service-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: A thread was started on thread_counter = 2
[03-03-2016 12:26:34] NPCD: DEBUG: load 2.580000/500.000000
[03-03-2016 12:26:34] NPCD: Processing file service-perfdata.1457025976 with ID 140665010779904 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/service-perfdata.1457025976
[03-03-2016 12:26:34] NPCD: ThreadCounter 3/5 File is service-perfdata.1457025991
[03-03-2016 12:26:34] NPCD: Processing file 'service-perfdata.1457025976'
[03-03-2016 12:26:34] NPCD: Regular File: service-perfdata.1457025991
[03-03-2016 12:26:34] NPCD: A thread was started on thread_counter = 3
[03-03-2016 12:26:34] NPCD: Processing file service-perfdata.1457025991 with ID 140664989804288 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata/service-perfdata.1457025991
[03-03-2016 12:26:34] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[03-03-2016 12:26:34] NPCD: Processing file 'service-perfdata.1457025991'
[03-03-2016 12:26:36] NPCD: No more files to process... waiting for 15 seconds
[03-03-2016 12:26:51] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata
[03-03-2016 12:26:51] NPCD: DEBUG: load 2.070000/500.000000

Re: missing bandwidth perf data for network devices

Posted: Thu Mar 03, 2016 10:52 pm
by Box293
The only real idea I have right now is that MRTG is stalling for some reason sometimes which makes it run longer than five minutes.

One suggestion I have is to track how long it takes for MRTG to run each time. I created this feature request some time ago:

http://tracker.nagios.com/view.php?id=650

I created a bash script that will run the MRTG job and when the MRTG job finishes it submits a passive check result to Nagios for a localhost service.

The passive service doesn't do any alerting, it's simply for knowing a) how long it takes to run each time b) collecting this data in RRDs for viewing the standard runtimes.

At this point that might be worth implementing this temporarily to see if MRTG is running longer that 5 minutes randomly.