Page 1 of 3

Missing RRD graphs

Posted: Wed Dec 23, 2015 1:49 pm
by CFT6Server
I have a couple network checks added yesterday and looks like the perfdata is not available. The warning I am getting is "/var/lib/mrtg/server_2.rrd does not exist."

I checked the folder /var/lib/mrtg and looks like it is there....

Here's the perfdata directory

Code: Select all

perfdata]# pwd
/usr/local/nagios/share/perfdata
perfdata]# ls server
CPULoad.rrd  CPU.xml      DiskIO_sda.rrd  _HOST_.xml  SwapIO.rrd  Swap.xml
CPULoad.xml  DiskAll.rrd  DiskIO_sda.xml  RAM.rrd     SwapIO.xml
CPU.rrd      DiskAll.xml  _HOST_.rrd      RAM.xml     Swap.rrd
So looks like it isn't getting picked up. I am tailing perfdata.log and not seeing anything that mention the server. Is there something else I should be checking to see why the perfdata isn't getting there?

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 2:24 pm
by Box293
Usually you see this error appear for the first 5 minutes after running the wizard, then it goes away.

You ran the Switch/Router wizard right?

Lets increase the logging verbosity and then take a deeper look into the logs. Follow the FAQ entry below to increase the log level of process_perfdata and npcd:

http://support.nagios.com/wiki/index.ph ... leshooting

Wait 15 - 20 minutes and then get a tail of the logs:

Code: Select all

tail -250 /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -250 /usr/local/nagios/var/npcd.log > /tmp/npcd.txt
Send us a copy of /tmp/perfdata.txt and /tmp/npcd.txt

Don't forget to turn down the log level as per the FAQ when you are done!

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 2:47 pm
by CFT6Server
Hi Box, I would have to go over the data before I send it to you, but perhaps you can tell me what you looking for. I've already increased the verbosity on the perfdata and npcd logs and I am not seeing any issues. I am not seeing the server name in question coming up for the bandwidth data update. I should be seeing something like this right?

Code: Select all

2015-12-23 11:46:27 [11154] [2] RRDs::update /usr/local/nagios/share/perfdata/server3/eth0_Bandwidth.rrd 1450899968:.000122:.000390
2015-12-23 11:46:27 [11154] [2] /usr/local/nagios/share/perfdata/server3/eth0_Bandwidth.rrd updated
I did run the wizard but it has been over 12 hours and other bandwidths are showing already.

Here's the NCPD log since there isn't any host specifics in there...

Code: Select all

[12-23-2015 11:43:25] NPCD: DEBUG: load 3.850000/20.000000
[12-23-2015 11:43:25] NPCD: ThreadCounter 3/5 File is 1450899799.perfdata.service
[12-23-2015 11:43:25] NPCD: Processing file '1450899799.perfdata.host'
[12-23-2015 11:43:25] NPCD: Regular File: 1450899799.perfdata.service
[12-23-2015 11:43:25] NPCD: Processing file 1450899799.perfdata.service with ID 140325479352064 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1450899799.perfdata.service
[12-23-2015 11:43:25] NPCD: A thread was started on thread_counter = 3
[12-23-2015 11:43:25] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[12-23-2015 11:43:25] NPCD: Processing file '1450899799.perfdata.service'
[12-23-2015 11:43:29] NPCD: No more files to process... waiting for 15 seconds
[12-23-2015 11:43:44] NPCD: Found 4 files in /usr/local/nagios/var/spool/perfdata/
[12-23-2015 11:43:44] NPCD: DEBUG: load 5.730000/20.000000
[12-23-2015 11:43:44] NPCD: ThreadCounter 0/5 File is .
[12-23-2015 11:43:44] NPCD: DEBUG: load 5.730000/20.000000
[12-23-2015 11:43:44] NPCD: ThreadCounter 0/5 File is ..
[12-23-2015 11:43:44] NPCD: DEBUG: load 5.730000/20.000000
[12-23-2015 11:43:44] NPCD: ThreadCounter 0/5 File is 1450899813.perfdata.host
[12-23-2015 11:43:44] NPCD: Regular File: 1450899813.perfdata.host
[12-23-2015 11:43:44] NPCD: A thread was started on thread_counter = 0
[12-23-2015 11:43:44] NPCD: Processing file 1450899813.perfdata.host with ID 140325513008896 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1450899813.perfdata.host
[12-23-2015 11:43:44] NPCD: DEBUG: load 5.730000/20.000000
[12-23-2015 11:43:44] NPCD: ThreadCounter 1/5 File is 1450899813.perfdata.service
[12-23-2015 11:43:44] NPCD: Processing file '1450899813.perfdata.host'
[12-23-2015 11:43:44] NPCD: Regular File: 1450899813.perfdata.service
[12-23-2015 11:43:44] NPCD: A thread was started on thread_counter = 1
[12-23-2015 11:43:44] NPCD: Processing file 1450899813.perfdata.service with ID 140325502519040 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1450899813.perfdata.service
[12-23-2015 11:43:44] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-23-2015 11:43:44] NPCD: Processing file '1450899813.perfdata.service'
[12-23-2015 11:43:47] NPCD: No more files to process... waiting for 15 seconds
[12-23-2015 11:44:02] NPCD: Found 4 files in /usr/local/nagios/var/spool/perfdata/
[12-23-2015 11:44:02] NPCD: DEBUG: load 4.960000/20.000000
[12-23-2015 11:44:02] NPCD: ThreadCounter 0/5 File is .
[12-23-2015 11:44:02] NPCD: DEBUG: load 4.960000/20.000000
[12-23-2015 11:44:02] NPCD: ThreadCounter 0/5 File is ..
[12-23-2015 11:44:02] NPCD: DEBUG: load 4.960000/20.000000
[12-23-2015 11:44:02] NPCD: ThreadCounter 0/5 File is 1450899829.perfdata.host
[12-23-2015 11:44:02] NPCD: Regular File: 1450899829.perfdata.host
[12-23-2015 11:44:02] NPCD: A thread was started on thread_counter = 0
[12-23-2015 11:44:02] NPCD: DEBUG: load 4.960000/20.000000
[12-23-2015 11:44:02] NPCD: ThreadCounter 1/5 File is 1450899829.perfdata.service
[12-23-2015 11:44:02] NPCD: Processing file 1450899829.perfdata.host with ID 140325513008896 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1450899829.perfdata.host
[12-23-2015 11:44:02] NPCD: Regular File: 1450899829.perfdata.service
[12-23-2015 11:44:02] NPCD: Processing file '1450899829.perfdata.host'
[12-23-2015 11:44:02] NPCD: A thread was started on thread_counter = 1
[12-23-2015 11:44:02] NPCD: Processing file 1450899829.perfdata.service with ID 140325502519040 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1450899829.perfdata.service
[12-23-2015 11:44:02] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[12-23-2015 11:44:02] NPCD: Processing file '1450899829.perfdata.service'
[12-23-2015 11:44:05] NPCD: No more files to process... waiting for 15 seconds

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 2:56 pm
by Box293
If it's not showing up in the logs then it's most likely not being properly processed.

This gets pretty complicated however you need to know all this to understand what is happening and how to best resolve your problems.

When the wizard is run against the switch 192.168.33.22 the following happens:
A file is created called:
/etc/mrtg/conf.d/192.168.33.22.cfg
The Nagios Bandwidth Services are created for each port.

Now the bandwidth graphs you see in the GUI are created using two seperate jobs.

The usage data is collected from the switches using a program called MRTG, we'll call this the MRTG Job.
Every 5 mintues a cron job runs on the Nagios server which runs MRTG.
MRTG looks in the /etc/mrtg/conf.d/ directory for all the devices it must talk to
Each .cfg file in this directory has a configuration of all the ports it must collect data for from that device
MRTG will take the data it collected for EACH port and put it into a SEPARATE file in the directory /var/lib/mrtg/

Then there is the Nagios Job
As mentioned earlier, the Wizard creates a Bandwidth service for EACH port you select to monitor
The service that is created will check the /var/lib/mrtg/ directory for the specific file for that port
It will take the data in that file and populate the correct RRD file in /usr/local/nagios/share/perfdata/192.168.33.22/
There is an RRD file for each network port being monitored
This happens every five minutes by Nagios (the default check_interval defined by the Wizard) for every bandwidth service that exists

Check you the MRTG file /etc/mrtg/conf.d/192.168.33.22.cfg (yours will be different)

The port is defined as something like 192.168.33.22_2
Make sure it is enabled (not commented out with #).

You could delete the /var/lib/mrtg/server_2.rrd file, that might solve the problem (it will automatically get re-created). It will take about 15 minutes for the error to disappear.

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:09 pm
by CFT6Server
So I decided to recreate the service check by removing it and rerunning it. Looks like it picked it up this time....

Code: Select all

2015-12-23 11:56:46 [21903] [2] RRDs::create /usr/local/nagios/share/perfdata/server4/eth0_Bandwidth.rrd RRA:AVERAGE:0.5:1:2880 RRA:AVERAGE:0.5:5:2880 RRA:AVERAGE:0.5:30:4320 RRA:AVERAGE:0.5:360:5840 RRA:MAX:0.5:1:2880 RRA:MAX:0.5:5:2880 RRA:MAX:0.5:30:4320 RRA:MAX:0.5:360:5840 RRA:MIN:0.5:1:2880 RRA:MIN:0.5:5:2880 RRA:MIN:0.5:30:4320 RRA:MIN:0.5:360:5840 DS:1:GAUGE:8460:U:U DS:2:GAUGE:8460:U:U --start=1450900582 --step=60
2015-12-23 11:56:46 [21903] [2] /usr/local/nagios/share/perfdata/server4/eth0_Bandwidth.rrd created
I did check the MRTG configurations and they look fine. I also tried just removing the rrd file in /var/lib/mrtg/ for another one I had issues with, it recreated the file but it didn't pick up the graph. Thanks for sharing the whole process here, I understood the MRTG side of things, but I think this indicates something didn't create right on the Nagios side.

I can easily just recreate these servers since I have two more like this, but I wanted to see if we can figure out where the issue lies. This could help with future incidents.

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:11 pm
by Box293
In CCM, under Monitoring > Services, can you click the disk icon for a re-created service.
Paste it here in a code block
Do the same for a service that is not working.

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:16 pm
by CFT6Server
Here are the checks that are sanitized... I think the configuration looks to be fine. It is whatever the process is going to /var/lib/mrtg/ isn't picking it up perhaps?

Working one:

Code: Select all

###############################################################################
#
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.3.2
# Date:	      2015-12-23 12:13:05
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios CCM will overwrite all manual settings during the next update if you 
# would like to edit files manually, place them in the 'static' directory or 
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
	host_name			server4
	service_description		eth0 Bandwidth
	use				xiwizard_switch_port_bandwidth_service
	servicegroups			ALL_Network_Bandwidth
	check_command			check_xi_service_mrtgtraf!server4_2.rrd!7,7!8,8!G
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server4
	service_description		eth0 Status
	use				xiwizard_switch_port_status_service
	servicegroups			ALL_Network_Bandwidth
	check_command			check_xi_service_ifoperstatus!nagios!2!-v 2
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server4
	service_description		eth1 Bandwidth
	use				xiwizard_switch_port_bandwidth_service
	servicegroups			ALL_Network_Bandwidth
	check_command			check_xi_service_mrtgtraf!server4_3.rrd!7,7!8,8!G
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server4
	service_description		eth1 Status
	use				xiwizard_switch_port_status_service
	servicegroups			ALL_Network_Bandwidth
	check_command			check_xi_service_ifoperstatus!nagios!3!-v 2
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################
Non-Working one:

Code: Select all

###############################################################################
#
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.3.2
# Date:	      2015-12-23 12:15:04
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios CCM will overwrite all manual settings during the next update if you 
# would like to edit files manually, place them in the 'static' directory or 
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
	host_name			server2
	service_description		eth0 Bandwidth
	use				xiwizard_switch_port_bandwidth_service
	servicegroups			WMI_NETWORK_Checks
	check_command			check_xi_service_mrtgtraf!server2_2.rrd!7,7!8,8!G
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server2
	service_description		eth0 Status
	use				xiwizard_switch_port_status_service
	servicegroups			WMI_NETWORK_Checks
	check_command			check_xi_service_ifoperstatus!nagios!2!-v 2
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server2
	service_description		eth1 Bandwidth
	use				xiwizard_switch_port_bandwidth_service
	servicegroups			WMI_NETWORK_Checks
	check_command			check_xi_service_mrtgtraf!server2_3.rrd!7,7!8,8!G
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

define service {
	host_name			server2
	service_description		eth1 Status
	use				xiwizard_switch_port_status_service
	servicegroups			WMI_NETWORK_Checks
	check_command			check_xi_service_ifoperstatus!nagios!3!-v 2
	max_check_attempts		2
	check_interval			15
	retry_interval			5
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	notifications_enabled		0
	contact_groups			admins,Generic Group,Linux Admins,Unix Admins
	_xiwizard			switch
	register			1
	}	

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:21 pm
by CFT6Server
Then there is the Nagios Job
As mentioned earlier, the Wizard creates a Bandwidth service for EACH port you select to monitor
The service that is created will check the /var/lib/mrtg/ directory for the specific file for that port
It will take the data in that file and populate the correct RRD file in /usr/local/nagios/share/perfdata/192.168.33.22/
There is an RRD file for each network port being monitored
This happens every five minutes by Nagios (the default check_interval defined by the Wizard) for every bandwidth service that exists
So does Nagios create a job somewhere to check and populate the graphs or does it just pick up anything that ends up in the folder? I have a feeling whatever mechanism that is, that's where it went south. So the wizard created the service checks and MRTG config, but the Nagios side of things isn't picking it up or not knowing that it needs to process this particular perfdata?

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:23 pm
by Box293
CFT6Server wrote:Here are the checks that are sanitized... I think the configuration looks to be fine. It is whatever the process is going to /var/lib/mrtg/ isn't picking it up perhaps?
Configs look correct.

Permissions perhaps?

What is the permission of a file in /var/lib/mrtg/ that is working compared to one that isn't:

Code: Select all

ls -la /var/lib/mrtg/server4_2.rrd
ls -la /var/lib/mrtg/server2_2.rrd

Re: Missing RRD graphs

Posted: Wed Dec 23, 2015 3:28 pm
by CFT6Server
# ls -l server*
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server1_2.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server1_3.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server2_2.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server2_3.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server3_2.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server3_3.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server4_2.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server4_3.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server5_2.rrd
-rw-r--r-- 1 root root 105312 Dec 23 12:02 server5_3.rrd

Looks ok here.....