Performance Graphs for some services are broken

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Performance Graphs for some services are broken

Post by uidaho »

Good morning

Most of our performance graphs are working fine, but for some services (all Windows Disk space usage) the performance graphs are not displayed in XI using the "Performance Graphs" tab. The same services generate graphs in Graph Explorer, and in an external tool that reads Nagios' RRD files (drraw). Data is being returned to the service checks.

Permissions look fine on the rrd and xml files for these. Here is a sample, with permissions on files for broken graphs first and working graphs second.

Code: Select all

-rwxrwxr-x 1 nagios nagios  384952 Sep 24 09:49 Disk_Usage_-_C_10_prcnt_free_win.rrd
-rw-rw-r-- 1 nagios nagios    2082 Sep 24 09:49 Disk_Usage_-_C_10_prcnt_free_win.xml
-rwxrwxr-x 1 nagios nagios 1151496 Sep 24 09:45 Perf_-_CPU_All_Usage_prcnt_win.rrd
-rw-rw-r-- 1 nagios nagios    2946 Sep 24 09:45 Perf_-_CPU_All_Usage_prcnt_win.xml
I have run
/usr/local/nagiosxi/scripts/reset_config_perms

Also, in XI, under Admin -> Monitoring Config -> Check File Permissions, we have green checkmarks.

We are running Nagios XI 2014R2.7.

Thank you for any help you can provide!
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance Graphs for some services are broken

Post by tgriep »

Can you provide screen captures from Graph Explorer, the performance and advanced tab for one of the services that are failing and upload them here?
One reason that could cause this if the service check was changed and different number of performance data variables are returned, sometimes that could cause the graphs to stop working.
Could that be it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Re: Performance Graphs for some services are broken

Post by uidaho »

Thank you for your reply. We've seen the problem where the number of collected data items changes and breaks the RRD files. Nobody admits to changing these services, but if there is a way to check the files without losing data I'd like to confirm.

Here are the screen shots you requested.
One.jpg
Two.jpg
Three.jpg
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance Graphs for some services are broken

Post by tgriep »

Looks like you have a broken link to the graph.
Can you run the following tail command and post it's output here while you select the Performance tab for that service?

Code: Select all

tail -f /var/log/httpd/error_log
Be sure to check out our Knowledgebase for helpful articles and solutions!
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Re: Performance Graphs for some services are broken

Post by uidaho »

Here is what appears in the apache error log when I open a page with the broken graph:

Code: Select all

ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=10\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10\r'.
Looks like an invalid character in the service definition?

Here is $ARG1$ from the broken service in CCM:

Code: Select all

CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10
Thanks again for helping with this.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Performance Graphs for some services are broken

Post by Box293 »

Can you follow the steps in this link:
https://support.nagios.com/wiki/index.p ... 14_Upgrade
Even though it talks about ping checks, it'll fix any rrd. Don't worry about doing the backups.

Does this fix the problem?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Re: Performance Graphs for some services are broken

Post by uidaho »

I ran the script as directed. Here are the last couple of lines of output:

Code: Select all

Batch job finished at Thu Sep 24 16:22:18 PDT 2015.
A total of 252 file(s) were updated with a total of 505 datasource(s).
Changes logged to the file /tmp/fix_rrd_ds.log
The Windows disk usage images still do not load, even after waiting over night.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Performance Graphs for some services are broken

Post by Box293 »

uidaho wrote:Here is what appears in the apache error log when I open a page with the broken graph:

Code: Select all

ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=10\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10\r'.
Looks like an invalid character in the service definition?

Here is $ARG1$ from the broken service in CCM:

Code: Select all

CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=10
Thanks again for helping with this.

Can I just confirm that this same error continues to appear in the apache error log.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
uidaho
Posts: 89
Joined: Tue Feb 12, 2013 11:58 am

Re: Performance Graphs for some services are broken

Post by uidaho »

Yes - the same apache error occurs. The log entry doesn't have a timestamp, but tail -f shows these still occur when I attempt to view the affected graphs.

Code: Select all

ERROR: I don't understand ':)\% Free Space' ShowAll MinCrit=20\r' in command: 'COMMENT:Check Command check_nrpe_CheckCounter -a 'Counter=\LogicalDisk(C:)\% Free Space' ShowAll MinCrit=20\r'.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Performance Graphs for some services are broken

Post by tgriep »

Can you delete the xml and rrd files for one of the services and see if the graph starts to work and report back if it does or not?
The automatic tool may not have caught all of the changes and deleting the files should fix it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked