Page 1 of 1

Custom check with perfdata but no graph

Posted: Fri Feb 05, 2021 9:48 am
by jvaira
Hello,
I have written a custom check that collects server temperature readings and everything looks to be working except for the performance graph. I am even seeing the performance data in the advanced tab of the service status detail page. Please see attached screenshots for details.

Re: Custom check with perfdata but no graph

Posted: Fri Feb 05, 2021 3:41 pm
by tgriep
First, go to the following folder

Code: Select all

/usr/local/nagios/share/perfdata/ads-lv-node-115
Delete the .rrd and .xml file with the name of the service and that should allow them to be recreated.
Wait for 15 to 30 minutes for them to update in the GUI and see if that allows the graphs to populate with data.

If this does not work, lets enable debugging for performance graphing by doing the following.
Edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
Change

Code: Select all

log_level = 0
To:

Code: Select all

log_level = 2
Save it

Edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
Change

Code: Select all

LOG_LEVEL = 0
to

Code: Select all

LOG_LEVEL = 2
Save out the file and restart these services by running

Code: Select all

service npcd restart
service nagios restart

Let the system run for 20 to 30 minutes and post the following files here so we can see what the errors are for that Service check when it tries to update the files.

Code: Select all

/usr/local/nagios/var/perfdata.log
/usr/local/nagios/var/npcd.log

Re: Custom check with perfdata but no graph

Posted: Mon Feb 08, 2021 10:58 am
by jvaira
Hell Tom,
The rrd and xml files did not even exist so I went ahead and just enabled the logging that you mentioned. Attached are the log files.

Re: Custom check with perfdata but no graph

Posted: Mon Feb 08, 2021 1:36 pm
by tgriep
Thanks for the log files.
I did not see any errors for the Temp_Check service but I did see that the performance data applications were timing out.

When the Nagios XI server gets loaded it will stop graphing performance data to keep it running smoothly for it's checks.
Those settings can be edited to keep that from happening. To do this edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
find the Timeout setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

TIMEOUT = 30
Save the file
then edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
find the load_threshold setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

load_threshold = 50.0
Save out the file and restart these services by running

Code: Select all

service npcd restart
service nagios restart
Let it run for 20 to 30 minutes and check that service again.

Re: Custom check with perfdata but no graph

Posted: Tue Feb 09, 2021 9:45 am
by jvaira
Hello Tom,
After making these changes I am still not seeing data in the graphs. One thing I did notice is that it is not limited to just this check and performance graphing for all checks seems to have stopped around Thursday last week. I am seeing unusually high user cpu usage ( screen shot 1 ) and an apache process that says it is using 650% cpu ( screen shot 2 ). I have already rebooted the machine to see if it would clear that process but is just immediately popped back up. Any ideas?

Re: Custom check with perfdata but no graph

Posted: Tue Feb 09, 2021 2:46 pm
by tgriep
Could you post your Nagios XI System Profile so we can review it to see if we can find out why the load is so high?
To get your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and upload it to the forum post or PM it to me if you do not want to post it.

Re: Custom check with perfdata but no graph

Posted: Wed Feb 10, 2021 11:45 am
by jvaira
Tom,
I was able to resolve the issue with the high user CPU but the load still seems fairly high and is hovering around 9 - 10. I have sent you a PM with the system profile.

Thanks

Re: Custom check with perfdata but no graph

Posted: Wed Feb 10, 2021 3:38 pm
by tgriep
Is it still the Apache process showing the highest consistent load?
In the Apache error log file, I saw this script running and causing errors.

Code: Select all

/tmp/123.sh
Is this what you found and fixed?

Also, something is running a curl command but the log does not show what it is.


Other than users that are connecting to the XI interface and a Fusion server, I did not see any thing that stands out for Apache load.

You can increase the PHP limits outlined in this article to see if it helps.
https://support.nagios.com/kb/article/n ... e-611.html


The only big thing I see is that the I/O wait is very high. This means that the system is spending a lot of time waiting to write to disk and that causes issues and slowness.
If the system is hosted in a virtual environment, move it to a faster disk subsystem and that will help a lot.

You can add a RAMDisk to the system to move some of the Disk I/O to memory to help the performance of the server. It is not a cure but is should help speed things up.
https://assets.nagios.com/downloads/nag ... giosXI.pdf


When the Nagios XI server gets loaded it will stop graphing performance data to keep it running smoothly for it's checks.
Those settings can be edited to keep that from happening. To do this edit this file

Code: Select all

/usr/local/nagios/etc/pnp/process_perfdata.cfg
find the Timeout setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

TIMEOUT = 30
Save the file
then edit this file

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg
find the load_threshold setting and change it to the following or to a higher value if it is already set that way.

Code: Select all

load_threshold = 50.0
Save out the file and restart these services by running

Code: Select all

systemctl restart npcd
systemctl restart nagios

That should keep the graphing function from stopping on the server when it is loaded.