Page 1 of 1
Some performance graphs are "randomly" missing
Posted: Thu Mar 01, 2012 5:39 pm
by john.newman
All,
This has been an issue for a while now, I would like to get it resolved if possible. Our XI config has the "graph explorer" page, which seems to work pretty well and is a great source of information. However there's one odd issue: certain services under certain hosts just do not show up in here. Given that it is
mostly working, I don't think its some sort of permission problem on the monitoring server. Here is a trivialized example of what I am seeing in the graph explorer -> scalable performance graph:
Host A [this is correct]
CPU Load
Disk Usage
Mem Usage
Host B
CPU Load
Mem Usage
Host C
Disk Usage
Mem Usage
Host D
Mem Usage
Host E
CPU Load
etc
Now in the configuration, I only have a total of three services defined. [again, this is a triviailized example]. All three of them, CPU, Mem, Disk have "Retain status information" = ON and "Process perf data" = ON, and these services are simply all applied to one host group, which includes Host A-E. And, Hosts B-E were a copy of A originally. So, how in the world would I see this inconsistency? It doesn't make a whole lot of sense to me... I could see that if I defined a separate service per host and didn't check those, but the three service objects should be distributed across every host in the group exactly the same.
The service detail page looks fine, and all the checks are working. It's just this graph page seems to pick and choose whatever it wants in some odd way.

What am I missing?
TIA

Re: Some performance graphs are "randomly" missing
Posted: Thu Mar 01, 2012 6:05 pm
by mguthrie
You might have the 1.0 version of the graph explorer, which had a bug like this. Here's the latest version, you can install it through the Admin->Manage Components page. See if it resolves your issue.
Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 10:31 am
by john.newman
uh .. ok. And .. do you work for nagios? can i trust that link? Seems kind of odd to use a random download like that... is there anything on the website or release notes about this?
thanks
Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 10:56 am
by scottwilkerson
john.newman wrote:uh .. ok. And .. do you work for nagios? can i trust that link? Seems kind of odd to use a random download like that... is there anything on the website or release notes about this?
thanks
John,
It's safe.
Mike does work for Nagios. Those of us that work for Nagios have bright green names with the Nagios logo above it.
Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 10:59 am
by mguthrie
John,
The graph explorer component is considered a customer-only download, so if you don't have a support and maintenance contract you won't be able to download the new version. I thought it simpler to post it to the thread directly for simplicity

Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 11:41 am
by john.newman
i see.. ok thanks.
one thing, you should do an svn export instead of a checkout - you gave me the .svn folder.
i can figure this out ... but while i have your attention, if you don't mind, where do i put this graphexplorer directory, and do i have to run any chown / chmod/ restart services .. thanks much
Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 11:48 am
by scottwilkerson
You can install the whole zip file through the Admin->Manage Components page
Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 12:14 pm
by john.newman
ok well that was easy. thanks. I'm glad I asked, I would have spent an hour or two digging through the filesystem messing around. very nice feature there.
It seems to have fixed the original problem I posted about. All of the service checks are showing up now. (at least it looks like all of them .. there's a few hundred, i'll have to take a closer look and make sure none are missing, but just a first glance looks all there.)
However. it seems to be including "retired" services. Our configuration has gone through many changes over the past several months. We used to have a "ping" service defined on all of the hosts, which is completely pointless as just defining the __HOST__ effectively creates the ping check. So we've removed that. However now in the graph explorer, some hosts are showing this "Ping" service, some are not.
Actually this may not be a bug, if I roll the filter back to -365 days, there is some perf data there in the graph from a very long time ago. There's some other "retired services" that show up and there's old data there as well. So I guess it's probably not a bug - but is there a way to disable these from showing up, or go in and purge the old perf data for them? Any way to hide these would be nice - but this is not nearly as big of a deal as what I had in the first post.
Perhaps this was intentional, as it's historic data and until _I_ delete it, it's probably correct on your part to continue to present it. I guess it depends on how you look at it, to me the graph explorer should be a 1:1 match with the current service detail list, but perhaps you delibarately include any perf data that is still there.
Thoughts? I'm happy though as at least the original problem is fixed. Thanks for that.

Re: Some performance graphs are "randomly" missing
Posted: Fri Mar 02, 2012 1:40 pm
by mguthrie
Yeah, the tricky part with the old data is that npcd (the performance processing service) generates performance data based on hostname/service_description, while XI has unique ID's for all hosts and services, so that way you can rename a service and you don't lose any historical data for it. However, upon renaming, a new set of rrd data get's created. Currently we chose to leave the old authorized services there in the event that someone wants to retain the performance data after a name change.
However, if you can clear the expired data by removing the associated rrd and XML files for a particular service. These files are located in:
/usr/local/nagios/share/perfdata/<host_name>/<service_description>.rrd
/usr/local/nagios/share/perfdata/<host_name>/<service_description>.xml
At some point we need to create a "garbage cleaner" type of feature that will allow you to do most of this from the UI, but currently this must be done manually.
Hope that helps!