Bug with NagiosXI Performance Grapher

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
wneville
Posts: 64
Joined: Wed Mar 31, 2021 3:35 pm

Bug with NagiosXI Performance Grapher

Post by wneville »

I am currently experience what I believe to be a bug in the NagiosXI performance grapher. I have a service running to capture what we have defined as 'non-standard' disk mounts ('standard' mounts = /, /apps, /boot, /home, /opt, /tmp, and /var) with the following plugin:

Code: Select all

/check_snmp_storage_wizard.pl -H $HOSTADDRESS$ -G -m ^/tmp\|/boot\|/apps\|/dev/shm\|/home\|/run\|/var\|/opt\|/sys/fs/cgroup\|memory\|Memory\|Swap\|/mongoshare/.snapshot\|/emedia/.snapshot -e -2 -C <community_string> -w 96 -c 97 -o 20000 -f -S 0 
This acts as a catch-all for db servers and unique application mounts.

It appears that performance graphs are not coming in correctly. On one host, all the mount points that are discovered show in the graph as 0 capacity (see attached photo dbdev3). Here are the check results for dbdev3 which correctly display perfdata from the check result:

Code: Select all

[root@nagiossrv1 libexec]# ./check_snmp_storage_wizard.pl -H dbdev3 -G -m ^/tmp\|/boot\|/apps\|/dev/shm\|/home\|/run\|/var\|/opt\|/sys/fs/cgroup\|memory\|Memory\|Swap\|/mongoshare/.snapshot\|/emedia/.snapshot -e -2 -C <community_string> -w 96 -c 97 -o 20000 -f -S 0
All selected storages (<96%) : OK | '/oradata/db07d'=17GB;31;31;0;32 '/oradata/db05d'=548GB;864;873;0;900 '/oradata/ppmdbt'=333GB;384;388;0;400 '/ck'=0GB;10;10;0;10 '/oradata/dmii'=15GB;24;24;0;25 '/oradata/mbtu'=156GB;259;262;0;270 '/oradata/db02xd'=187GB;287;290;0;299 '/oradata/db07dg'=98GB;192;194;0;200 '/oradata/tmii'=16GB;19;19;0;20 '/'=9GB;13;14;0;14 '/oradata/mbtdg'=201GB;355;359;0;370 '/oradata/ppmdbd'=337GB;383;387;0;399 '/oradata/db02dg'=635GB;671;678;0;699 '/oradata/db04d'=126GB;240;242;0;250 '/oradata/db04i'=97GB;192;194;0;200 '/orafra'=90GB;96;97;0;100 '/oradata/db07i'=13GB;31;31;0;32 '/orawork'=15GB;96;97;0;100 '/oradata/db05i'=240GB;360;364;0;375 '/orashare'=166804GB;1474560;1489920;0;1536000 '/ora'=113GB;259;262;0;270 '/oradata/db02d'=164GB;336;340;0;350 '/oradata/mbtd'=156GB;288;291;0;300 '/dev/vx'=0GB;0;0;0;0 '/oradata/ppmdba'=323GB;384;388;0;400 '/oradata/db04xd'=100GB;182;184;0;190 '/oradata/mbti'=156GB;240;242;0;250 '/oradata/ppmdbi'=318GB;384;388;0;400 '/oradata/db04dg'=229GB;288;291;0;300 '/oraarchive'=4GB;1056;1067;0;1100 '/oradata/db02i'=200GB;230;233;0;240
On another, the performance data is not consistent, likely due to the order in which the mounts are displayed in the check results (nagiossrv1). Each time the check results come in the mounts are in a different order and it looks like the performance grapher prioritizes order over mount name.

Please let me know if this is intended behavior or if there is a fix planned
You do not have the required permissions to view the files attached to this post.
Last edited by wneville on Mon May 06, 2024 2:53 pm, edited 2 times in total.
User avatar
jmichaelson
Posts: 129
Joined: Wed Aug 23, 2023 1:02 pm

Re: Bug with NagiosXI Performance Grapher

Post by jmichaelson »

I don't have a quick solution, and this does indeed look buggy. I've opened an internal issue for the matter.
Please let us know if you have any other questions or concerns.

-Jason
User avatar
jmichaelson
Posts: 129
Joined: Wed Aug 23, 2023 1:02 pm

Re: Bug with NagiosXI Performance Grapher

Post by jmichaelson »

I've been corrected. It appears that you are using the "-S 0" option for the check which re-arranges the output of the plugin depending on what is in Warning or Critical state.
-S, --short=<type>[,<where>,<cut>]
<type>: Make the output shorter :
0 : only print the global result except the disk in warning or critical
ex: "< 80% : OK"
1 : Don't print all info for every disk
ex : "/ : 66 %used (< 80) : OK"
<where>: (optional) if = 1, put the OK/WARN/CRIT at the beginning
<cut>: take the <n> first characters or <n> last if n<0
Please let us know if you have any other questions or concerns.

-Jason
wneville
Posts: 64
Joined: Wed Mar 31, 2021 3:35 pm

Re: Bug with NagiosXI Performance Grapher

Post by wneville »

That seems to me to be unrelated, this service has never alerted and has only once been in a "Soft" state of CRITICAL due to a timeout. The mount point perfdata is swapped almost every time the check runs. I ran these checks back-to-back on command line:

Code: Select all

./check_snmp_storage_wizard.pl -H <hostname> -G -m ^/tmp\|/boot\|/apps\|/dev/shm\|/home\|/run\|/var\|/opt\|/sys/fs/cgroup\|memory\|Memory\|Swap\|/mongoshare/.snapshot\|/emedia/.snapshot -e -2 -C <community_string>-w 96 -c 97 -o 20000 -f -S 0
All selected storages (<96%) : OK | '/data'=51GB;163;165;0;170 '/data/nagiosramdisk'=0GB;0;0;0;0 '/'=29GB;50;50;0;52

Code: Select all

./check_snmp_storage_wizard.pl -H <hostname> -G -m ^/tmp\|/boot\|/apps\|/dev/shm\|/home\|/run\|/var\|/opt\|/sys/fs/cgroup\|memory\|Memory\|Swap\|/mongoshare/.snapshot\|/emedia/.snapshot -e -2 -C <community_string> -w 96 -c 97 -o 20000 -f -S 0
All selected storages (<96%) : OK | '/'=29GB;50;50;0;52 '/data'=51GB;163;165;0;170 '/data/nagiosramdisk'=0GB;0;0;0;0
Order of perfdata results in first check: /data, /data/nagiosramdisk, root (/)
Order of perfdata results in second check: root (/), /data, /data/nagiosramdisk
User avatar
swolf
Developer
Posts: 312
Joined: Tue Jun 06, 2017 9:48 am

Re: Bug with NagiosXI Performance Grapher

Post by swolf »

Hi @wneville,

Right now our performance graphing backend is using a component called pnp4nagios, which does require that a check outputs the same labels in the same order, every time the check is run. We have ambitions to move away from that solution in the near future, but for now any plugin that doesn't keep a consistent label order will run into issues.

This does look like it's from a plugin that we ship by default in XI - I've filed a bug for the plugin, and will try to get you a patch in advance of the next release.

-Sebastian
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy
wneville
Posts: 64
Joined: Wed Mar 31, 2021 3:35 pm

Re: Bug with NagiosXI Performance Grapher

Post by wneville »

Thanks so much! Any ideas on the behavior seen by the db host? The issue there was the plugin from the CLI shows perfdata being populated, but the graph in XI shows 0's for all the mount points
User avatar
swolf
Developer
Posts: 312
Joined: Tue Jun 06, 2017 9:48 am

Re: Bug with NagiosXI Performance Grapher

Post by swolf »

Hi @wneville,

For the graph showing all 0's, can you go into Home->Details->Service Status, click on the service, go into the Advanced tab, and copy the Performance Data entry into this thread? I think this could be caused either by a formatting error / difference between your CLI output and what the web interface sees, or it could be that the existing RRD isn't formatted properly for the number of entries you're currently seeing.

For the inconsistent label ordering, see attached for an updated plugin. Please 1) make a backup of your old plugin, then 2) copy this file into /usr/local/nagios/libexec/check_snmp_storage_wizard.pl . After that, go into the Service Detail for the check with the bad graphs, and go to Configure->Re-configure this service->Monitoring. In Monitor the service with this command, you'll want to add the argument --sort-perfdata. After saving, your performance data should show in a consistent order (but the graph labels may not match the returned performance data). Once that's confirmed, you'll want to delete the file at /usr/local/nagios/share/perfdata/<HOST_NAME>/<SERVICE_DESCRIPTION>.rrd (or move it to a different location), and any new data should be correctly named.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy
wneville
Posts: 64
Joined: Wed Mar 31, 2021 3:35 pm

Re: Bug with NagiosXI Performance Grapher

Post by wneville »

My SFTP is not working at the moment but I will check out the changes with the version of the plugin you attached when that is back up and working.

In the meantime, I was able to get perfdata to sort by changing line 452 in the original check_snmp_storage_wizard.pl from:

Code: Select all

foreach my $key ( keys %$resultat) {
to:

Code: Select all

foreach my $key (sort keys %$resultat) {
I don't have a use case where I wouldn't like them sorted so this just becomes the default behavior with this change and hasn't had any negative impacts as far as I can tell. Also, since I made that change, the service in question is no longer bringing in all 0's to the performance graph. I would've like to get to the root of the issue, but it seems like it is working now (and is sorting correctly in order to graph correctly). I don't think the sort had anything to do with the graph giving all 0s but I have notified the dba team that the graph is properly populating and am awaiting their confirmation that the nagiosxi perfdata lines up with their data
Post Reply