GPU Temperature AMD RX 580

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
prestonc
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

GPU Temperature AMD RX 580

Post by prestonc »

Hi all

So I'm new to Nagios, and that may be why I'm struggling.

I'm looking to monitor GPU temps for Radeon RX 580's running on Ubuntu Server 16.04

The plugin that looks like it might be correct is check_gputemp which is listed at https://exchange.nagios.org/directory/P ... mp/details
It wants to use fglrx, which I see is no longer used in Ubuntu 16.04. I have amdgpr-pro installed as that's the correct driver package for my distro.

Does anyone out there know if this plugin will work with my specs, or if there is a plugin out there that I should look at?
Or am I heading down a dead end?

Should I use a different product to monitor (Sorry, I know I'm in the Nagious forums, but...)

Hope you can help.


Preston
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: GPU Temperature AMD RX 580

Post by mcapra »

Are you able to use lm-sensors to get the GPU temps? That's probably the way to go. This plugin looks like it can leverage the SNMP information provided by lm-sensors:
https://exchange.nagios.org/directory/P ... rs/details

Otherwise, do you currently have some way by which you can view the GPU temperatures via the CLI? If so, could you share that process step-by-step? Might be able to script it out and have Nagios Core get the data by leveraging an agent like NCPA or NRPE.

A good starting point if you wanted to do this yourself would be having a simple script that can send the temperatures to stdout. From there, it's not that complicated to alter the script to work in a way that Nagios Core likes:
https://nagios-plugins.org/doc/guidelines.html
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: GPU Temperature AMD RX 580

Post by dwhitfield »

Thanks for the assist @mcapra!
prestonc
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

Re: GPU Temperature AMD RX 580

Post by prestonc »

Thanks mcapra

lm-sensors is installed, and when I run sensors I do get the required info:

acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +119.0°C)
temp2: +29.8°C (crit = +119.0°C)

amdgpu-pci-0100
Adapter: PCI adapter
fan1: 3496 RPM
temp1: +73.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0500
Adapter: PCI adapter
fan1: 3852 RPM
temp1: +70.0°C (crit = +0.0°C, hyst = +0.0°C)

amdgpu-pci-0900
Adapter: PCI adapter
fan1: 1531 RPM
temp1: +78.0°C (crit = +0.0°C, hyst = +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +30.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +28.0°C (high = +80.0°C, crit = +100.0°C)

So that's all cool.

I looked at the check_snmp_lmsensors.pl script, but am failing miserably.
The script is looking for Nagios::Plugin which I hear has been renamed Monitor::Plugin
I installed Monitoring::Plugin from here: http://search.cpan.org/dist/Monitoring-Plugin/
Now it moans about Params::Validate not being installed.
I can't seem to find solid info about this.

You think I've missed a prerequisite step?
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: GPU Temperature AMD RX 580

Post by mcapra »

So the way Perl works is that it has a list of modules (typically at the start of the script) that are required. I don't think you missed a step so much as the documentation for this plugin is sort of lacking.

One thing worth mentioning is installing Monitoring::Plugin may be a mistake. While it is the "latest" version of the module, the script is explicitly requesting the Nagios::Plugin module on line 20:

Code: Select all

use Nagios::Plugin ;
You could probably alter the script to reference Monitoring::Plugin instead, but I'm not sure what the broader impact of that would be. You'd probably need to refactor the code in check_snmp_lmsensors.pl.

But as you have discerned, yeah you're missing some dependencies. I don't have an Ubuntu system to test against right now, but you should be able to install them via apt and some combination of repositories. I was able to install Params::Validate via yum like so:

Code: Select all

yum install perl-Params-Validate
Perhaps the same package name works with apt?
Former Nagios employee
https://www.mcapra.com/
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: GPU Temperature AMD RX 580

Post by lmiltchev »

@prestonc, did mcapra's solution work for you? Is the issue resolved or you need more help?
Be sure to check out our Knowledgebase for helpful articles and solutions!
prestonc
Posts: 3
Joined: Thu Jun 08, 2017 7:19 am

Re: GPU Temperature AMD RX 580

Post by prestonc »

Hi all

I'm still working to get it running.
My head hurts, but I think I'm getting there.
I'll post back my success/failure asap.

Cheers


Preston
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: GPU Temperature AMD RX 580

Post by dwhitfield »

Ok, great, just let us know!
Locked