Page 1 of 1
load average not correct in GUI
Posted: Wed Dec 16, 2020 12:50 am
by RebeccaIlene
Hi Team,
The load average, when checked from the command line gives the correct output.
However, the GUI shows as OK when in the command line it is in Warning.
Please suggest what is causing this.
Is this because the version of Nagios is not yet updated?
We are on 5.7.3.
Re: load average not correct in GUI
Posted: Wed Dec 16, 2020 5:52 pm
by vtrac
Hi RebeccaIlene,
Can you post the full check command as run from the CLI so we can try to replicate this on our test system. Are you receiving false notifications as a result of this discrepeancy?
Also, please send me the "profile.zip" and the exact name of the host and services sending notifications and we can review the command there.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Regards,
Vinh
Re: load average not correct in GUI
Posted: Sun Dec 20, 2020 8:22 pm
by RebeccaIlene
Thanks for your reply.
I have PM you a copy of the profile.
Yes, the alerts are in OK even when it goes to critical.
[root@hostname ~]# /usr/local/nagios/libexec/check_load -r -w 3.6,2.8,2.0 -c 4.0,3.2,2.8
OK - load average per CPU: 0.14, 0.14, 0.15|load1=0.140;3.600;4.000;0; load5=0.138;2.800;3.200;0; load15=0.152;2.000;2.800;0;
Re: load average not correct in GUI
Posted: Mon Dec 21, 2020 12:51 pm
by vtrac
Hi RebeccaIlene,
I noticed from couple of logs in your "profile.zip" that you are having DB connection issues (below):
Code: Select all
.....................
PROCESSED 27 COMMANDS
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>PROCESSING COMMAND ID 776575...
---------
DONE. Processed 0 files.
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>Outbound data DISABLED Thu, 17 Dec 2020 16:16:15 +1100
Please run the below commands to fix your DB issue:
Code: Select all
cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
Please note that the "-r'" option for the "check_load" is for "percpu". You might want to run without that option since your system do have multiple CPU's. The load average format is the same used by "uptime" and "w" commands.
Also, check_load runs every five munites (default). Once issue is detected, check_load will runs every minute up to five minutes before sending out notification. This is to prevent false alarm notification.
If you are going to compare CLI with GUI ..... please remember to allow the five minute interval as mention above.
If you are still experience check_load issue after DB has been repaired. Please give me the service name and hostname of the one you want me to look at ..... (please upload picture(s), if you can ...)
Best Regards,
Vinh
Re: load average not correct in GUI
Posted: Mon Jan 04, 2021 7:36 pm
by RebeccaIlene
Thanks for that. We did have other issues with Nagios that day. However, the output still shows incorrect even after the repair.
We do not see the database repair error anymore but the load output is still incorrect.
Please let me know how we can fix this.
When I check for top on the server output is load average: 1.30, 1.05, 0.96 but the output on nagios is OK - load average per CPU: 0.35, 0.27, 0.24.
Re: load average not correct in GUI
Posted: Tue Jan 05, 2021 2:51 pm
by vtrac
Hi RebeccaIlene,
I have discussed this issue with my teammates and was told that Nagios XI got the load average from the "/proc/loadavg" file.
Could you please try the following commands each on a separate xterm windows?
Code: Select all
1. top
2. nproc
3. watch '/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.8'
4. watch '/usr/local/nagios/libexec/check_load -r -w 3.6,2.8,2.0 -c 4.0,3.2,2.8'
5. watch 'cat /proc/loadavg'
Here is the picture from my test (below). I'm running the check_load command without a "-r" option since I only have one CPU.
load-avg.png
If your system has 2 cpu's, for example:
Code: Select all
$ cat /proc/loadavg
1.20 1.0 0.88
$ check_load -r [...]
OK - load average per CPU: 0.60, 0.5, 0.44
Please upload results after you have a chance to test out.
Regards,
Vinh
Re: load average not correct in GUI
Posted: Tue Jan 05, 2021 3:00 pm
by vtrac
Also, you can run "nproc " to print the number of processing units (CPU) available.
Regards,