Page 1 of 3
Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 12:02 pm
by uidaho
Good morning
Most of our performance graphs are working fine, but for some services (all Windows Disk space usage) the performance graphs are not displayed in XI using the "Performance Graphs" tab. The same services generate graphs in Graph Explorer, and in an external tool that reads Nagios' RRD files (drraw). Data is being returned to the service checks.
Permissions look fine on the rrd and xml files for these. Here is a sample, with permissions on files for broken graphs first and working graphs second.
Code: Select all
-rwxrwxr-x 1 nagios nagios 384952 Sep 24 09:49 Disk_Usage_-_C_10_prcnt_free_win.rrd
-rw-rw-r-- 1 nagios nagios 2082 Sep 24 09:49 Disk_Usage_-_C_10_prcnt_free_win.xml
-rwxrwxr-x 1 nagios nagios 1151496 Sep 24 09:45 Perf_-_CPU_All_Usage_prcnt_win.rrd
-rw-rw-r-- 1 nagios nagios 2946 Sep 24 09:45 Perf_-_CPU_All_Usage_prcnt_win.xml
I have run
/usr/local/nagiosxi/scripts/reset_config_perms
Also, in XI, under Admin -> Monitoring Config -> Check File Permissions, we have green checkmarks.
We are running Nagios XI 2014R2.7.
Thank you for any help you can provide!
Re: Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 12:50 pm
by tgriep
Can you provide screen captures from Graph Explorer, the performance and advanced tab for one of the services that are failing and upload them here?
One reason that could cause this if the service check was changed and different number of performance data variables are returned, sometimes that could cause the graphs to stop working.
Could that be it?
Re: Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 1:33 pm
by uidaho
Thank you for your reply. We've seen the problem where the number of collected data items changes and breaks the RRD files. Nobody admits to changing these services, but if there is a way to check the files without losing data I'd like to confirm.
Here are the screen shots you requested.
One.jpg
Two.jpg
Three.jpg
Re: Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 2:14 pm
by tgriep
Looks like you have a broken link to the graph.
Can you run the following tail command and post it's output here while you select the Performance tab for that service?
Re: Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 3:35 pm
by uidaho
Here is what appears in the apache error log when I open a page with the broken graph:
Code: Select all
ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=10\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10\r'.
Looks like an invalid character in the service definition?
Here is $ARG1$ from the broken service in CCM:
Code: Select all
CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10
Thanks again for helping with this.
Re: Performance Graphs for some services are broken
Posted: Thu Sep 24, 2015 5:10 pm
by Box293
Can you follow the steps in this link:
https://support.nagios.com/wiki/index.p ... 14_Upgrade
Even though it talks about ping checks, it'll fix any rrd. Don't worry about doing the backups.
Does this fix the problem?
Re: Performance Graphs for some services are broken
Posted: Fri Sep 25, 2015 11:08 am
by uidaho
I ran the script as directed. Here are the last couple of lines of output:
Code: Select all
Batch job finished at Thu Sep 24 16:22:18 PDT 2015.
A total of 252 file(s) were updated with a total of 505 datasource(s).
Changes logged to the file /tmp/fix_rrd_ds.log
The Windows disk usage images still do not load, even after waiting over night.
Re: Performance Graphs for some services are broken
Posted: Fri Sep 25, 2015 1:15 pm
by Box293
uidaho wrote:Here is what appears in the apache error log when I open a page with the broken graph:
Code: Select all
ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=10\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10\r'.
Looks like an invalid character in the service definition?
Here is $ARG1$ from the broken service in CCM:
Code: Select all
CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10
Thanks again for helping with this.
Can I just confirm that this same error continues to appear in the apache error log.
Re: Performance Graphs for some services are broken
Posted: Fri Sep 25, 2015 2:11 pm
by uidaho
Yes - the same apache error occurs. The log entry doesn't have a timestamp, but tail -f shows these still occur when I attempt to view the affected graphs.
Code: Select all
ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=20\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=20\r'.
Re: Performance Graphs for some services are broken
Posted: Sun Sep 27, 2015 4:23 pm
by tgriep
Can you delete the xml and rrd files for one of the services and see if the graph starts to work and report back if it does or not?
The automatic tool may not have caught all of the changes and deleting the files should fix it.