Page 1 of 1

GPU Temperature AMD RX 580

Posted: Thu Jun 08, 2017 7:29 am
by prestonc
Hi all

So I'm new to Nagios, and that may be why I'm struggling.

I'm looking to monitor GPU temps for Radeon RX 580's running on Ubuntu Server 16.04

The plugin that looks like it might be correct is check_gputemp which is listed at https://exchange.nagios.org/directory/P ... mp/details
It wants to use fglrx, which I see is no longer used in Ubuntu 16.04. I have amdgpr-pro installed as that's the correct driver package for my distro.

Does anyone out there know if this plugin will work with my specs, or if there is a plugin out there that I should look at?
Or am I heading down a dead end?

Should I use a different product to monitor (Sorry, I know I'm in the Nagious forums, but...)

Hope you can help.


Preston

Re: GPU Temperature AMD RX 580

Posted: Thu Jun 08, 2017 10:48 am
by mcapra
Are you able to use lm-sensors to get the GPU temps? That's probably the way to go. This plugin looks like it can leverage the SNMP information provided by lm-sensors:
https://exchange.nagios.org/directory/P ... rs/details

Otherwise, do you currently have some way by which you can view the GPU temperatures via the CLI? If so, could you share that process step-by-step? Might be able to script it out and have Nagios Core get the data by leveraging an agent like NCPA or NRPE.

A good starting point if you wanted to do this yourself would be having a simple script that can send the temperatures to stdout. From there, it's not that complicated to alter the script to work in a way that Nagios Core likes:
https://nagios-plugins.org/doc/guidelines.html

Re: GPU Temperature AMD RX 580

Posted: Thu Jun 08, 2017 2:10 pm
by dwhitfield
Thanks for the assist @mcapra!

Re: GPU Temperature AMD RX 580

Posted: Tue Jun 13, 2017 4:56 am
by prestonc
Thanks mcapra

lm-sensors is installed, and when I run sensors I do get the required info:

acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +119.0°C)
temp2: +29.8°C (crit = +119.0°C)

amdgpu-pci-0100
Adapter: PCI adapter
fan1: 3496 RPM
temp1: +73.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0500
Adapter: PCI adapter
fan1: 3852 RPM
temp1: +70.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0900
Adapter: PCI adapter
fan1: 1531 RPM
temp1: +78.0°C (crit = +0.0°C, hyst = +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +28.0°C (high = +80.0°C, crit = +100.0°C)

So that's all cool.

I looked at the check_snmp_lmsensors.pl script, but am failing miserably.
The script is looking for Nagios::Plugin which I hear has been renamed Monitor::Plugin
I installed Monitoring::Plugin from here: http://search.cpan.org/dist/Monitoring-Plugin/
Now it moans about Params::Validate not being installed.
I can't seem to find solid info about this.

You think I've missed a prerequisite step?

Re: GPU Temperature AMD RX 580

Posted: Tue Jun 13, 2017 9:53 am
by mcapra
So the way Perl works is that it has a list of modules (typically at the start of the script) that are required. I don't think you missed a step so much as the documentation for this plugin is sort of lacking.

One thing worth mentioning is installing Monitoring::Plugin may be a mistake. While it is the "latest" version of the module, the script is explicitly requesting the Nagios::Plugin module on line 20:

Code: Select all

use Nagios::Plugin ;
You could probably alter the script to reference Monitoring::Plugin instead, but I'm not sure what the broader impact of that would be. You'd probably need to refactor the code in check_snmp_lmsensors.pl.

But as you have discerned, yeah you're missing some dependencies. I don't have an Ubuntu system to test against right now, but you should be able to install them via apt and some combination of repositories. I was able to install Params::Validate via yum like so:

Code: Select all

yum install perl-Params-Validate
Perhaps the same package name works with apt?

Re: GPU Temperature AMD RX 580

Posted: Tue Jun 13, 2017 12:40 pm
by lmiltchev
@prestonc, did mcapra's solution work for you? Is the issue resolved or you need more help?

Re: GPU Temperature AMD RX 580

Posted: Fri Jun 16, 2017 5:24 am
by prestonc
Hi all

I'm still working to get it running.
My head hurts, but I think I'm getting there.
I'll post back my success/failure asap.

Cheers


Preston

Re: GPU Temperature AMD RX 580

Posted: Fri Jun 16, 2017 9:34 am
by dwhitfield
Ok, great, just let us know!