Page 1 of 1
Graph gaps on certain services.
Posted: Tue Mar 07, 2017 1:30 pm
by CameronWP
Hello:
I am seeing strange gaps in the graphs on my Nagios service checks as attached. The strange thing is the only affects certain services, others are not affected at all. Looked around the forum a bit but wasn't able to find anyone with a similar issue.
Here is a chart with the gaps:
gaps.JPG
Here is a chart from the same Nagios instance over the same time period with no gaps:
nogaps.JPG
Thanks!
Re: Graph gaps on certain services.
Posted: Tue Mar 07, 2017 4:57 pm
by ssax
Please run through this KB article and let us know if that resolves it for you:
https://support.nagios.com/kb/article.php?id=9
Thank you
Re: Graph gaps on certain services.
Posted: Mon Mar 20, 2017 1:43 pm
by CameronWP
Thanks for the reply but that article didn't help at all. I wasn't able to see anything in the logging. Any other thoughts?
Thanks!
Re: Graph gaps on certain services.
Posted: Mon Mar 20, 2017 5:02 pm
by avandemore
XI > Admin > System Profile > Download Profile
Please include the zip file in your response. You can PM myself or other support personnel if you have privacy concerns.
Re: Graph gaps on certain services.
Posted: Mon Mar 20, 2017 5:02 pm
by mcapra
Can you also share the perfdata files from both of those services? If i'm understanding your configuration correctly, they should be located at:
Code: Select all
/usr/local/nagios/share/perfdata/EPOSVR1/<service_description>.rrd
/usr/local/nagios/share/perfdata/CELHC-DRS01/<service_description>.rrd
If you're not sure which rrd to grab, just send them all

Re: Graph gaps on certain services.
Posted: Tue Mar 21, 2017 3:50 pm
by mcapra
So here's the service check we're referencing:
Code: Select all
define service {
host_name EXCHSVR1,EXCHSVR2
service_description Check CPU - Load
use windows_service
hostgroup_name WP - Administration,WP - Atrium Host Group,WP - EMC Host Group,WP - Virtual Machines
check_command check_win_cpu!70!80!!!!!!
max_check_attempts 5
check_interval 1
retry_interval 1
check_period workhours
first_notification_delay 15
notification_period workhours
contact_groups admins
register 1
}
Note the check_period, and here is that timeperiod referenced:
Code: Select all
define timeperiod {
timeperiod_name workhours
alias Normal Work Hours
friday 09:00-17:00
thursday 09:00-17:00
wednesday 09:00-17:00
tuesday 09:00-17:00
monday 09:00-17:00
}
So looking at the graph you provided, the gaps make sense since these checks are only running for 8 hours per day and therefore only collecting performance data for 8 hours per day. The graph can't chart data it doesn't possess.
Though, if there was existing data for the other host's CPU check, due to the averaged lossy nature of RRDs, I can see where a regular graph might be produced. So this boils down to: RRDs are lossy and imperfect for odd data intervals at times. If you're not consistently checking every 4-6 hours, you can produce data sets like the one you're seeing.The "heartbeat" of the RRDs can be adjusted (
/usr/local/nagios/etc/pnp/rra.cfg), but only system-wide.
Re: Graph gaps on certain services.
Posted: Wed Mar 22, 2017 7:16 am
by CameronWP
Bah, thank you! Sometimes it is great to have another set of eyes on an issue.
I appreciate it!
Re: Graph gaps on certain services.
Posted: Wed Mar 22, 2017 9:04 am
by cdienger
Glad Matt was able to help you out. Did you have any more questions or are we okay to lock this thread?
Re: Graph gaps on certain services.
Posted: Wed Mar 22, 2017 2:01 pm
by CameronWP
Yep, feel free to lock it. Thanks!