Nagios server getting over load

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
manimurugesan
Posts: 145
Joined: Wed Oct 03, 2018 9:15 am

Nagios server getting over load

Post by manimurugesan »

Hello Team,

nagiosxi server is frequently getting overload,i have attached the system profile and PFB screen shot also

Please suggest i guess mrtg will be the issue



TOP command output:

top
top - 15:54:40 up 38 days, 4:51, 2 users, load average: 160.69, 134.33, 135.83
Tasks: 1116 total, 180 running, 936 sleeping, 0 stopped, 0 zombie
Cpu(s): 70.8%us, 28.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 65978484k total, 13197912k used, 52780572k free, 297036k buffers
Swap: 33554428k total, 2916k used, 33551512k free, 10255616k cached
You do not have the required permissions to view the files attached to this post.
Last edited by benjaminsmith on Tue Jul 30, 2019 4:50 pm, edited 1 time in total.
Reason: saved profile
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios server getting over load

Post by benjaminsmith »

Hi @manimurugesan,

Have you made any changes to the server or Apache configurations recently. After looking over the system profile, I see a number of issues.

1. The error your seeing in the uploaded image, specifically "Error: Could not parse XML output from https://server" could be an SSL setting on the server. Follow the document below to make sure all the settings are correct.

How to Configure SSL/TLS
./nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
2. Run the database repair script, log in as root an run the following command:

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
3. Post the output of the following command to check the size of your database tables.

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Couldn't resolve host 'api.nagios.com'
4. It can't resolve the connection to the license server, so you may have a DNS issue. What is the output of the following command:

Code: Select all

nslookup api.nagios.com
5. Lastly, run the following to re-start Nagios and clear the message queue.

Code: Select all

service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond star
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
manimurugesan
Posts: 145
Joined: Wed Oct 03, 2018 9:15 am

Re: Nagios server getting over load

Post by manimurugesan »

Hello benjamin,

Please find the below output and i have attached output of command to check the size of database tables.


nslookup api.nagios.com
Server: Name server(which server mentioned in /etc/resolve.conf)
Address: Name server
** server can't find api.nagios.com: NXDOMAIN

i did database repair but still server load is showing high .

Could you please suggest what action need to be taken for this ?
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagios server getting over load

Post by benjaminsmith »

Hello @manimurugesan,

After further review, you have SELinux enabled on the server and this is preventing mrtg from functioning properly.
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
See: Disabling SELinux

We also noticed that you have gnome installed on the server, and this may result in decreased performance. We recommend a clean, minimal installation for Nagios XI.

Also, you have DNS setup internally and should configure an external DNS. Nagios XI t cannot call out to the licensing server ( api.nagios.com ).

If you continue to experience high load, please run the following top command and post the full output so we can review the processes.

Code: Select all

top -n 1
Thanks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
manimurugesan
Posts: 145
Joined: Wed Oct 03, 2018 9:15 am

Re: Nagios server getting over load

Post by manimurugesan »

Hello benjamin,

We have tried all the commands which given by you but still issue is persist and i have checked selinux status also it is in disabled state only .PFB output for the same.

# sestatus
SELinux status: disabled

I have attached the top -n 1 command output ,please let us know what action need to be taken from our end ?
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios server getting over load

Post by tgriep »

The output of the top command shows that the highest 4 processes at that time was the MRTG process so we need to look at that.

Can you run the following commands as root and post the the /tmp/mrtg.txt file here?

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log &> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg &>> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios &>> /tmp/mrtg.txt
{ time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg 2>1 ; } 2>> /tmp/mrtg.txt
The following entries in the /var/log/messages file

Code: Select all

Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug  9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012*****  Plugin restorecon (94.8 confidence) suggests   ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012*****  Plugin catchall_labels (5.21 confidence) suggests   *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012*****  Plugin catchall (1.44 confidence) suggests   **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug  9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012*****  Plugin restorecon (94.8 confidence) suggests   ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012*****  Plugin catchall_labels (5.21 confidence) suggests   *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012*****  Plugin catchall (1.44 confidence) suggests   **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012
Are coming from the setroubleshootd daemon that is running on your server.

Code: Select all

setroub+ 20792     1 13 10:24 ?        00:00:01 /usr/bin/python -Es /usr/sbin/setroubleshootd -f
You should configure it so it will allow the MRTG process to access the /var/lib/mrtg folder which could be the cause of the high load for the MTRG process.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked