Hi Team,
The load average, when checked from the command line gives the correct output.
However, the GUI shows as OK when in the command line it is in Warning.
Please suggest what is causing this.
Is this because the version of Nagios is not yet updated?
We are on 5.7.3.
load average not correct in GUI
Re: load average not correct in GUI
Hi RebeccaIlene,
Can you post the full check command as run from the CLI so we can try to replicate this on our test system. Are you receiving false notifications as a result of this discrepeancy?
Also, please send me the "profile.zip" and the exact name of the host and services sending notifications and we can review the command there.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Regards,
Vinh
Can you post the full check command as run from the CLI so we can try to replicate this on our test system. Are you receiving false notifications as a result of this discrepeancy?
Also, please send me the "profile.zip" and the exact name of the host and services sending notifications and we can review the command there.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Regards,
Vinh
-
RebeccaIlene
- Posts: 164
- Joined: Tue Apr 02, 2019 8:38 pm
Re: load average not correct in GUI
Thanks for your reply.
I have PM you a copy of the profile.
Yes, the alerts are in OK even when it goes to critical.
[root@hostname ~]# /usr/local/nagios/libexec/check_load -r -w 3.6,2.8,2.0 -c 4.0,3.2,2.8
OK - load average per CPU: 0.14, 0.14, 0.15|load1=0.140;3.600;4.000;0; load5=0.138;2.800;3.200;0; load15=0.152;2.000;2.800;0;
I have PM you a copy of the profile.
Yes, the alerts are in OK even when it goes to critical.
[root@hostname ~]# /usr/local/nagios/libexec/check_load -r -w 3.6,2.8,2.0 -c 4.0,3.2,2.8
OK - load average per CPU: 0.14, 0.14, 0.15|load1=0.140;3.600;4.000;0; load5=0.138;2.800;3.200;0; load15=0.152;2.000;2.800;0;
Re: load average not correct in GUI
Hi RebeccaIlene,
I noticed from couple of logs in your "profile.zip" that you are having DB connection issues (below):
Please run the below commands to fix your DB issue:
Please note that the "-r'" option for the "check_load" is for "percpu". You might want to run without that option since your system do have multiple CPU's. The load average format is the same used by "uptime" and "w" commands.
Also, check_load runs every five munites (default). Once issue is detected, check_load will runs every minute up to five minutes before sending out notification. This is to prevent false alarm notification.
If you are going to compare CLI with GUI ..... please remember to allow the five minute interval as mention above.
If you are still experience check_load issue after DB has been repaired. Please give me the service name and hostname of the one you want me to look at ..... (please upload picture(s), if you can ...)
Best Regards,
Vinh
I noticed from couple of logs in your "profile.zip" that you are having DB connection issues (below):
Code: Select all
.....................
PROCESSED 27 COMMANDS
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>PROCESSING COMMAND ID 776575...
---------
DONE. Processed 0 files.
<h3>Database Error</h3>A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.<p>Run the following from the CLI as root to attempt to repair the DB:<br><pre>/usr/local/nagiosxi/scripts/repair_databases.sh</pre></p>Outbound data DISABLED Thu, 17 Dec 2020 16:16:15 +1100
Code: Select all
cd /usr/local/nagiosxi/scripts/
./repair_databases.sh
Also, check_load runs every five munites (default). Once issue is detected, check_load will runs every minute up to five minutes before sending out notification. This is to prevent false alarm notification.
If you are going to compare CLI with GUI ..... please remember to allow the five minute interval as mention above.
If you are still experience check_load issue after DB has been repaired. Please give me the service name and hostname of the one you want me to look at ..... (please upload picture(s), if you can ...)
Best Regards,
Vinh
-
RebeccaIlene
- Posts: 164
- Joined: Tue Apr 02, 2019 8:38 pm
Re: load average not correct in GUI
Thanks for that. We did have other issues with Nagios that day. However, the output still shows incorrect even after the repair.
We do not see the database repair error anymore but the load output is still incorrect.
Please let me know how we can fix this.
When I check for top on the server output is load average: 1.30, 1.05, 0.96 but the output on nagios is OK - load average per CPU: 0.35, 0.27, 0.24.
We do not see the database repair error anymore but the load output is still incorrect.
Please let me know how we can fix this.
When I check for top on the server output is load average: 1.30, 1.05, 0.96 but the output on nagios is OK - load average per CPU: 0.35, 0.27, 0.24.
Re: load average not correct in GUI
Hi RebeccaIlene,
I have discussed this issue with my teammates and was told that Nagios XI got the load average from the "/proc/loadavg" file.
Could you please try the following commands each on a separate xterm windows?
Here is the picture from my test (below). I'm running the check_load command without a "-r" option since I only have one CPU.
If your system has 2 cpu's, for example:
Please upload results after you have a chance to test out.
Regards,
Vinh
I have discussed this issue with my teammates and was told that Nagios XI got the load average from the "/proc/loadavg" file.
Could you please try the following commands each on a separate xterm windows?
Code: Select all
1. top
2. nproc
3. watch '/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.8'
4. watch '/usr/local/nagios/libexec/check_load -r -w 3.6,2.8,2.0 -c 4.0,3.2,2.8'
5. watch 'cat /proc/loadavg'
Code: Select all
$ cat /proc/loadavg
1.20 1.0 0.88
$ check_load -r [...]
OK - load average per CPU: 0.60, 0.5, 0.44
Regards,
Vinh
You do not have the required permissions to view the files attached to this post.
Re: load average not correct in GUI
Also, you can run "nproc " to print the number of processing units (CPU) available.
Regards,
Code: Select all
# nproc