Issue with Nagios XI Graphing
Issue with Nagios XI Graphing
We have migrated from Nagios Core Version 4.4.2 to Nagios XI Version 5.8.7 and having issues with graphing.
Some servers have services that are graphing as expected but others are not. Typical example is 2 Linux 8 servers set up identically in NagiosXI, both use the check-host-alive check command and this graphs OK, however when we get down to services graphing works for one but the other doesn't create rrd or xml files for the perfdata.
Both are using the same service, ie I have used a service that was imported from our original nagios installation, and have just assigned both servers to that service. Both servers have identical nrpe.cfg and plugin. If I run the plugin on the hosts it returns the same format detail eg./dev/sda1 1038336 250388 787948 25% /boot | 25
When I run the check from NagiosXI the output is different:
/dev/sda1 1038336 250388 787948 25% /boot (won't graph or create rrd file)
DISK OK - free space: /boot 1786 MB (87% inode=99%): (graphs as expected)
Hope this all makes sense, It is probably something simple but I have run out of ideas and have tried all things I could find in forums etc, including confirming directory/file ownership and permissions. I am a bit of a newbie with Nagios so learning as I go.
Everything else appears to be working OK.
Some servers have services that are graphing as expected but others are not. Typical example is 2 Linux 8 servers set up identically in NagiosXI, both use the check-host-alive check command and this graphs OK, however when we get down to services graphing works for one but the other doesn't create rrd or xml files for the perfdata.
Both are using the same service, ie I have used a service that was imported from our original nagios installation, and have just assigned both servers to that service. Both servers have identical nrpe.cfg and plugin. If I run the plugin on the hosts it returns the same format detail eg./dev/sda1 1038336 250388 787948 25% /boot | 25
When I run the check from NagiosXI the output is different:
/dev/sda1 1038336 250388 787948 25% /boot (won't graph or create rrd file)
DISK OK - free space: /boot 1786 MB (87% inode=99%): (graphs as expected)
Hope this all makes sense, It is probably something simple but I have run out of ideas and have tried all things I could find in forums etc, including confirming directory/file ownership and permissions. I am a bit of a newbie with Nagios so learning as I go.
Everything else appears to be working OK.
Re: Issue with Nagios XI Graphing
Hello @robhgiham
Thanks for reaching out and providing the details, want to find out if you see rrd and XML files in '/usr/local/nagios/share/perfdata/<config name>' created?
Please remove and let me know if they are recreated within 30minutes or so.
If so, look at the web console to see if it renders graph, if the issue persists please send a System Profile.
To send us your system profile.
Perry
Thanks for reaching out and providing the details, want to find out if you see rrd and XML files in '/usr/local/nagios/share/perfdata/<config name>' created?
Please remove and let me know if they are recreated within 30minutes or so.
Code: Select all
rm -rf <service_name.rrd>
rm -rf <service_name.xml>
To send us your system profile.
- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and send via Private Message
Perry
Re: Issue with Nagios XI Graphing
Thanks Perry,
The host that is not graphing is not creating rrd and xml files for the services, (it does have some rrd and xml files for swap).
On the similar host that is working I deleted the rrd and xml file for this service and it recreated the files and and restarted graphing the service.
Out of desperation, I have previously tried copying those files and modifying the xml file to reflect the name of the host that ins't graphing but nothing happened so removed them.
I have IM'd you the profile.zip
Regards,
Rob
The host that is not graphing is not creating rrd and xml files for the services, (it does have some rrd and xml files for swap).
On the similar host that is working I deleted the rrd and xml file for this service and it recreated the files and and restarted graphing the service.
Out of desperation, I have previously tried copying those files and modifying the xml file to reflect the name of the host that ins't graphing but nothing happened so removed them.
I have IM'd you the profile.zip
Regards,
Rob
Re: Issue with Nagios XI Graphing
Hello @robhigiham
Thanks for following up, want to have you bump up the logging and pm the host and service including the logs.
Increase the verbose logging and restart the npcd.service, then allow it to run for awhile so we collect logging and then run the following:
Please [pm] the host and service name that is not graphing and attach '/tmp/perflogs.tar.gz'.
Thanks,
Perry
Thanks for following up, want to have you bump up the logging and pm the host and service including the logs.
Increase the verbose logging and restart the npcd.service, then allow it to run for awhile so we collect logging and then run the following:
Code: Select all
tar -czvf /tmp/perflogs.tar.gz /usr/local/nagios/var/spool/perfdata/ /usr/local/nagios/var/spool/xidpe/ /usr/local/nagios/var/perfdata.log /usr/local/nagios/var/npcd.log
Thanks,
Perry
Re: Issue with Nagios XI Graphing
Thanks Perry,
I have PM'd the info you requested. For reference, the host/service that is working ok is 'besra'
Regards,
Rob
I have PM'd the info you requested. For reference, the host/service that is working ok is 'besra'
Regards,
Rob
Re: Issue with Nagios XI Graphing
Hello @robhigiman
Not sure what happened but the compressed file was extracted with errors. Could you please try again, very that you have sudo powers when running it to be able to access and read all.
I did test it from my test VM and verified it.
Thanks,
Perry
Not sure what happened but the compressed file was extracted with errors. Could you please try again, very that you have sudo powers when running it to be able to access and read all.
Code: Select all
tar -czvf /tmp/perflogs.tar.gz /usr/local/nagios/var/spool/perfdata/ /usr/local/nagios/var/spool/xidpe/ /usr/local/nagios/var/perfdata.log /usr/local/nagios/var/npcd.log
Thanks,
Perry
Re: Issue with Nagios XI Graphing
Hi Perry,
I have resent the file, it was OK on my server, I think the issue was when I sftp'd it to my PC, forgot to change the transfer method to binary. Have resent it to you, hopefully OK this time
Regards,
Rob
I have resent the file, it was OK on my server, I think the issue was when I sftp'd it to my PC, forgot to change the transfer method to binary. Have resent it to you, hopefully OK this time
Regards,
Rob
Re: Issue with Nagios XI Graphing
Hello @robhgiham
Thanks for following up and resending the information over. We see that data from the host are showing up in the perfdata.
Want to have you run through this:
or apt-get install -y librrd-simple-perl #for Debian/Ubuntu
Download from:
or
To run the script with RRD backups:
To run the script without RRD backups (if you have performed one of the suggested backup options above):
This process may take a considerable amount of time depending on many RRDs needed to be updated. The script logs to /tmp/fix_rrd_ds.log. Once completed, it may take 5-10 minutes for the new datasources to appear in the performance graphs tab (longer if rrdcached is used).
Restart the npcd service:
Please let us know how things look,
Perry
Thanks for following up and resending the information over. We see that data from the host are showing up in the perfdata.
Want to have you run through this:
Code: Select all
cpan -i RRD::Simple
Download from:
Code: Select all
https://assets.nagios.com/downloads/nagiosxi/scripts/rrd_ds_fix.zip
Code: Select all
cd /tmp
wget https://assets.nagios.com/downloads/nagiosxi/scripts/rrd_ds_fix.zip
unzip rrd_ds_fix.zip
To run the script with RRD backups:
Code: Select all
./fix_ds_quantity.sh -d /usr/local/nagios/share/perfdata/
To run the script without RRD backups (if you have performed one of the suggested backup options above):
Code: Select all
./fix_ds_quantity.sh -i -d /usr/local/nagios/share/perfdata/
This process may take a considerable amount of time depending on many RRDs needed to be updated. The script logs to /tmp/fix_rrd_ds.log. Once completed, it may take 5-10 minutes for the new datasources to appear in the performance graphs tab (longer if rrdcached is used).
Restart the npcd service:
Code: Select all
systemctl restart npcd
Perry
Re: Issue with Nagios XI Graphing
Hi Perry,
I went through the process and it didn't seem to make much difference, didn't create any new rdd files for the ones that are missing, here is the log:
Batch job confirmed by user.
Batch process started at Mon Dec 13 13:13:14 AEDT 2021
Populating list of RRDs from the dircetory: /usr/local/nagios/share/perfdata/
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_runtime.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_rows.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_errors.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_invalid.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_skipped.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_update.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_create.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_s_rows.rrd
Batch job finished at Mon Dec 13 13:13:22 AEDT 2021.
A total of 0 file(s) were updated with a total of 0 datasource(s).
Changes logged to the file /tmp/fix_rrd_ds.log
The existing hosts/services rdd files are working ok just not creating new ones for services that aren't graphing.
Regards,
Rob
I went through the process and it didn't seem to make much difference, didn't create any new rdd files for the ones that are missing, here is the log:
Batch job confirmed by user.
Batch process started at Mon Dec 13 13:13:14 AEDT 2021
Populating list of RRDs from the dircetory: /usr/local/nagios/share/perfdata/
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_runtime.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_rows.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_errors.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_invalid.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_skipped.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_update.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_create.rrd
Ignoring pnp-internal RRD: /usr/local/nagios/share/perfdata/.pnp-internal/runtime_s_rows.rrd
Batch job finished at Mon Dec 13 13:13:22 AEDT 2021.
A total of 0 file(s) were updated with a total of 0 datasource(s).
Changes logged to the file /tmp/fix_rrd_ds.log
The existing hosts/services rdd files are working ok just not creating new ones for services that aren't graphing.
Regards,
Rob
Re: Issue with Nagios XI Graphing
Hello @robhgiham
Thanks for getting back to me on the results. From your last statement, I went through the logs and we see that "RRDs::update" for 'swap' and '_Host' data are written. We see from the logged messages that it "Found Performance Data for.....". For some odd reason, it is not either to write or it is not able to interpret the data.
Thanks,
Perry
Thanks for getting back to me on the results. From your last statement, I went through the logs and we see that "RRDs::update" for 'swap' and '_Host' data are written. We see from the logged messages that it "Found Performance Data for.....". For some odd reason, it is not either to write or it is not able to interpret the data.
Want to find out if you are able to find an RRD 'disk_root', 'memory', 'ingres_gateway', or 'disk_boot' and PM a copy so we can see what that looks like.RRDs::update /usr/local/nagios/share/perfdata/your_bpt_host/swap.rrd 1638995212:0.:0.:8264.:2.61644:0
/usr/local/nagios/share/perfdata/your_bpt_host/swap.rrd updated
Found Performance Data for your_bpt_host / disk_root (54)
Found Performance Data for your_bpt_host / memory (16)
Found Performance Data for your_bpt_host / ingres_gateway (1)
Found Performance Data for your_bpt_host / disk_boot (25)
Found Performance Data for your_bpt_host / _HOST_ (rta=1.521000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0)
RRDs::update /usr/local/nagios/share/perfdata/your_bpt_host/_HOST_.rrd 1638995428:1.521000:0
Thanks,
Perry