GPU Temperature AMD RX 580

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

GPU Temperature AMD RX 580

Postby prestonc » Thu Jun 08, 2017 7:29 am

Hi all

So I'm new to Nagios, and that may be why I'm struggling.

I'm looking to monitor GPU temps for Radeon RX 580's running on Ubuntu Server 16.04

The plugin that looks like it might be correct is check_gputemp which is listed at https://exchange.nagios.org/directory/P ... mp/details
It wants to use fglrx, which I see is no longer used in Ubuntu 16.04. I have amdgpr-pro installed as that's the correct driver package for my distro.

Does anyone out there know if this plugin will work with my specs, or if there is a plugin out there that I should look at?
Or am I heading down a dead end?

Should I use a different product to monitor (Sorry, I know I'm in the Nagious forums, but...)

Hope you can help.


Preston
prestonc
 
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

Re: GPU Temperature AMD RX 580

Postby mcapra » Thu Jun 08, 2017 10:48 am

Are you able to use lm-sensors to get the GPU temps? That's probably the way to go. This plugin looks like it can leverage the SNMP information provided by lm-sensors:
https://exchange.nagios.org/directory/P ... rs/details

Otherwise, do you currently have some way by which you can view the GPU temperatures via the CLI? If so, could you share that process step-by-step? Might be able to script it out and have Nagios Core get the data by leveraging an agent like NCPA or NRPE.

A good starting point if you wanted to do this yourself would be having a simple script that can send the temperatures to stdout. From there, it's not that complicated to alter the script to work in a way that Nagios Core likes:
https://nagios-plugins.org/doc/guidelines.html
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 2773
Joined: Thu May 05, 2016 3:54 pm

Re: GPU Temperature AMD RX 580

Postby dwhitfield » Thu Jun 08, 2017 2:10 pm

Thanks for the assist @mcapra!
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
dwhitfield
The Doctor
 
Posts: 3756
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: GPU Temperature AMD RX 580

Postby prestonc » Tue Jun 13, 2017 4:56 am

Thanks mcapra

lm-sensors is installed, and when I run sensors I do get the required info:

acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +119.0°C)
temp2: +29.8°C (crit = +119.0°C)

amdgpu-pci-0100
Adapter: PCI adapter
fan1: 3496 RPM
temp1: +73.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0500
Adapter: PCI adapter
fan1: 3852 RPM
temp1: +70.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0900
Adapter: PCI adapter
fan1: 1531 RPM
temp1: +78.0°C (crit = +0.0°C, hyst = +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +28.0°C (high = +80.0°C, crit = +100.0°C)

So that's all cool.

I looked at the check_snmp_lmsensors.pl script, but am failing miserably.
The script is looking for Nagios::Plugin which I hear has been renamed Monitor::Plugin
I installed Monitoring::Plugin from here: http://search.cpan.org/dist/Monitoring-Plugin/
Now it moans about Params::Validate not being installed.
I can't seem to find solid info about this.

You think I've missed a prerequisite step?
prestonc
 
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

Re: GPU Temperature AMD RX 580

Postby mcapra » Tue Jun 13, 2017 9:53 am

So the way Perl works is that it has a list of modules (typically at the start of the script) that are required. I don't think you missed a step so much as the documentation for this plugin is sort of lacking.

One thing worth mentioning is installing Monitoring::Plugin may be a mistake. While it is the "latest" version of the module, the script is explicitly requesting the Nagios::Plugin module on line 20:
Code: Select all
use Nagios::Plugin ;


You could probably alter the script to reference Monitoring::Plugin instead, but I'm not sure what the broader impact of that would be. You'd probably need to refactor the code in check_snmp_lmsensors.pl.

But as you have discerned, yeah you're missing some dependencies. I don't have an Ubuntu system to test against right now, but you should be able to install them via apt and some combination of repositories. I was able to install Params::Validate via yum like so:

Code: Select all
yum install perl-Params-Validate


Perhaps the same package name works with apt?
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 2773
Joined: Thu May 05, 2016 3:54 pm

Re: GPU Temperature AMD RX 580

Postby lmiltchev » Tue Jun 13, 2017 12:40 pm

@prestonc, did mcapra's solution work for you? Is the issue resolved or you need more help?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
lmiltchev
QA Manager
 
Posts: 9457
Joined: Mon May 23, 2011 12:15 pm

Re: GPU Temperature AMD RX 580

Postby prestonc » Fri Jun 16, 2017 5:24 am

Hi all

I'm still working to get it running.
My head hurts, but I think I'm getting there.
I'll post back my success/failure asap.

Cheers


Preston
prestonc
 
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

Re: GPU Temperature AMD RX 580

Postby dwhitfield » Fri Jun 16, 2017 9:34 am

Ok, great, just let us know!
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
dwhitfield
The Doctor
 
Posts: 3756
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC


Return to Nagios Core

Who is online

Users browsing this forum: Google [Bot] and 11 guests