Page 1 of 1

Graph gaps on certain services.

Posted: Tue Mar 07, 2017 1:30 pm
by CameronWP
Hello:

I am seeing strange gaps in the graphs on my Nagios service checks as attached. The strange thing is the only affects certain services, others are not affected at all. Looked around the forum a bit but wasn't able to find anyone with a similar issue.

Here is a chart with the gaps:
gaps.JPG
Here is a chart from the same Nagios instance over the same time period with no gaps:
nogaps.JPG
Thanks!

Re: Graph gaps on certain services.

Posted: Tue Mar 07, 2017 4:57 pm
by ssax
Please run through this KB article and let us know if that resolves it for you:

https://support.nagios.com/kb/article.php?id=9


Thank you

Re: Graph gaps on certain services.

Posted: Mon Mar 20, 2017 1:43 pm
by CameronWP
Thanks for the reply but that article didn't help at all. I wasn't able to see anything in the logging. Any other thoughts?

Thanks!

Re: Graph gaps on certain services.

Posted: Mon Mar 20, 2017 5:02 pm
by avandemore
XI > Admin > System Profile > Download Profile

Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.

Re: Graph gaps on certain services.

Posted: Mon Mar 20, 2017 5:02 pm
by mcapra
Can you also share the perfdata files from both of those services? If i'm understanding your configuration correctly, they should be located at:

Code: Select all

/usr/local/nagios/share/perfdata/EPOSVR1/<service_description>.rrd
/usr/local/nagios/share/perfdata/CELHC-DRS01/<service_description>.rrd
If you're not sure which rrd to grab, just send them all :)

Re: Graph gaps on certain services.

Posted: Tue Mar 21, 2017 3:50 pm
by mcapra
So here's the service check we're referencing:

Code: Select all

define service {
	host_name			EXCHSVR1,EXCHSVR2
	service_description		Check CPU - Load
	use				windows_service
	hostgroup_name			WP - Administration,WP - Atrium Host Group,WP - EMC Host Group,WP - Virtual Machines
	check_command			check_win_cpu!70!80!!!!!!
	max_check_attempts		5
	check_interval			1
	retry_interval			1
	check_period			workhours
	first_notification_delay	15
	notification_period		workhours
	contact_groups			admins
	register			1
	}	
Note the check_period, and here is that timeperiod referenced:

Code: Select all

define timeperiod {
	timeperiod_name               		workhours
	alias                         		Normal Work Hours
	friday                        		09:00-17:00
	thursday                      		09:00-17:00
	wednesday                     		09:00-17:00
	tuesday                       		09:00-17:00
	monday                        		09:00-17:00
	}

So looking at the graph you provided, the gaps make sense since these checks are only running for 8 hours per day and therefore only collecting performance data for 8 hours per day. The graph can't chart data it doesn't possess.

Though, if there was existing data for the other host's CPU check, due to the averaged lossy nature of RRDs, I can see where a regular graph might be produced. So this boils down to: RRDs are lossy and imperfect for odd data intervals at times. If you're not consistently checking every 4-6 hours, you can produce data sets like the one you're seeing.The "heartbeat" of the RRDs can be adjusted (/usr/local/nagios/etc/pnp/rra.cfg), but only system-wide.

Re: Graph gaps on certain services.

Posted: Wed Mar 22, 2017 7:16 am
by CameronWP
Bah, thank you! Sometimes it is great to have another set of eyes on an issue.

I appreciate it!

Re: Graph gaps on certain services.

Posted: Wed Mar 22, 2017 9:04 am
by cdienger
Glad Matt was able to help you out. Did you have any more questions or are we okay to lock this thread?

Re: Graph gaps on certain services.

Posted: Wed Mar 22, 2017 2:01 pm
by CameronWP
Yep, feel free to lock it. Thanks!