Nagios Support Forum

Posted: **Tue Mar 07, 2017 1:30 pm**

Hello:

I am seeing strange gaps in the graphs on my Nagios service checks as attached. The strange thing is the only affects certain services, others are not affected at all. Looked around the forum a bit but wasn't able to find anyone with a similar issue.

Here is a chart with the gaps:

gaps.JPG

Here is a chart from the same Nagios instance over the same time period with no gaps:

nogaps.JPG

Thanks!

Posted: **Tue Mar 07, 2017 4:57 pm**

Please run through this KB article and let us know if that resolves it for you:

https://support.nagios.com/kb/article.php?id=9

Thank you

Posted: **Mon Mar 20, 2017 1:43 pm**

Thanks for the reply but that article didn't help at all. I wasn't able to see anything in the logging. Any other thoughts?

Thanks!

Posted: **Mon Mar 20, 2017 5:02 pm**

XI > Admin > System Profile > Download Profile

Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.

Posted: **Mon Mar 20, 2017 5:02 pm**

Can you also share the perfdata files from both of those services? If i'm understanding your configuration correctly, they should be located at:

Code: Select all

/usr/local/nagios/share/perfdata/EPOSVR1/<service_description>.rrd
/usr/local/nagios/share/perfdata/CELHC-DRS01/<service_description>.rrd

If you're not sure which rrd to grab, just send them all

Posted: **Tue Mar 21, 2017 3:50 pm**

So here's the service check we're referencing:

Code: Select all

define service {
	host_name			EXCHSVR1,EXCHSVR2
	service_description		Check CPU - Load
	use				windows_service
	hostgroup_name			WP - Administration,WP - Atrium Host Group,WP - EMC Host Group,WP - Virtual Machines
	check_command			check_win_cpu!70!80!!!!!!
	max_check_attempts		5
	check_interval			1
	retry_interval			1
	check_period			workhours
	first_notification_delay	15
	notification_period		workhours
	contact_groups			admins
	register			1
	}

Note the check_period, and here is that timeperiod referenced:

Code: Select all

define timeperiod {
	timeperiod_name               		workhours
	alias                         		Normal Work Hours
	friday                        		09:00-17:00
	thursday                      		09:00-17:00
	wednesday                     		09:00-17:00
	tuesday                       		09:00-17:00
	monday                        		09:00-17:00
	}

So looking at the graph you provided, the gaps make sense since these checks are only running for 8 hours per day and therefore only collecting performance data for 8 hours per day. The graph can't chart data it doesn't possess.

Though, if there was existing data for the other host's CPU check, due to the averaged lossy nature of RRDs, I can see where a regular graph might be produced. So this boils down to: RRDs are lossy and imperfect for odd data intervals at times. If you're not consistently checking every 4-6 hours, you can produce data sets like the one you're seeing.The "heartbeat" of the RRDs can be adjusted (/usr/local/nagios/etc/pnp/rra.cfg), but only system-wide.

Posted: **Wed Mar 22, 2017 7:16 am**

Bah, thank you! Sometimes it is great to have another set of eyes on an issue.

I appreciate it!

Posted: **Wed Mar 22, 2017 9:04 am**

Glad Matt was able to help you out. Did you have any more questions or are we okay to lock this thread?

Posted: **Wed Mar 22, 2017 2:01 pm**

Yep, feel free to lock it. Thanks!

Nagios Support Forum

Graph gaps on certain services.

Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.

Re: Graph gaps on certain services.