Page 1 of 1
Nagios server getting over load
Posted: Tue Jul 30, 2019 10:10 am
by manimurugesan
Hello Team,
nagiosxi server is frequently getting overload,i have attached the system profile and PFB screen shot also
Please suggest i guess mrtg will be the issue
TOP command output:
top
top - 15:54:40 up 38 days, 4:51, 2 users, load average: 160.69, 134.33, 135.83
Tasks: 1116 total, 180 running, 936 sleeping, 0 stopped, 0 zombie
Cpu(s): 70.8%us, 28.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 65978484k total, 13197912k used, 52780572k free, 297036k buffers
Swap: 33554428k total, 2916k used, 33551512k free, 10255616k cached
Re: Nagios server getting over load
Posted: Wed Jul 31, 2019 12:55 pm
by benjaminsmith
Hi
@manimurugesan,
Have you made any changes to the server or Apache configurations recently. After looking over the system profile, I see a number of issues.
1. The error your seeing in the uploaded image, specifically "Error: Could not parse XML output from
https://server" could be an SSL setting on the server. Follow the document below to make sure all the settings are correct.
How to Configure SSL/TLS
./nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
2. Run the database repair script, log in as root an run the following command:
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
3. Post the output of the following command to check the size of your database tables.
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table
Couldn't resolve host 'api.nagios.com'
4. It can't resolve the connection to the license server, so you may have a DNS issue. What is the output of the following command:
5. Lastly, run the following to re-start Nagios and clear the message queue.
Code: Select all
service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond star
Re: Nagios server getting over load
Posted: Mon Aug 12, 2019 10:17 am
by manimurugesan
Hello benjamin,
Please find the below output and i have attached output of command to check the size of database tables.
nslookup api.nagios.com
Server: Name server(which server mentioned in /etc/resolve.conf)
Address: Name server
** server can't find api.nagios.com: NXDOMAIN
i did database repair but still server load is showing high .
Could you please suggest what action need to be taken for this ?
Re: Nagios server getting over load
Posted: Mon Aug 12, 2019 2:27 pm
by benjaminsmith
Hello
@manimurugesan,
After further review, you have SELinux enabled on the server and this is preventing mrtg from functioning properly.
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
See:
Disabling SELinux
We also noticed that you have gnome installed on the server, and this may result in decreased performance. We recommend a clean, minimal installation for Nagios XI.
Also, you have DNS setup internally and should configure an external DNS. Nagios XI t cannot call out to the licensing server ( api.nagios.com ).
If you continue to experience high load, please run the following top command and post the full output so we can review the processes.
Thanks.
Re: Nagios server getting over load
Posted: Tue Aug 13, 2019 10:20 am
by manimurugesan
Hello benjamin,
We have tried all the commands which given by you but still issue is persist and i have checked selinux status also it is in disabled state only .PFB output for the same.
# sestatus
SELinux status: disabled
I have attached the top -n 1 command output ,please let us know what action need to be taken from our end ?
Re: Nagios server getting over load
Posted: Tue Aug 13, 2019 3:56 pm
by tgriep
The output of the top command shows that the highest 4 processes at that time was the MRTG process so we need to look at that.
Can you run the following commands as root and post the the /tmp/mrtg.txt file here?
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log &> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg &>> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios &>> /tmp/mrtg.txt
{ time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg 2>1 ; } 2>> /tmp/mrtg.txt
The following entries in the /var/log/messages file
Code: Select all
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug 9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012***** Plugin restorecon (94.8 confidence) suggests ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012***** Plugin catchall_labels (5.21 confidence) suggests *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012***** Plugin catchall (1.44 confidence) suggests **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug 9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012***** Plugin restorecon (94.8 confidence) suggests ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012***** Plugin catchall_labels (5.21 confidence) suggests *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012***** Plugin catchall (1.44 confidence) suggests **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012
Are coming from the setroubleshootd daemon that is running on your server.
Code: Select all
setroub+ 20792 1 13 10:24 ? 00:00:01 /usr/bin/python -Es /usr/sbin/setroubleshootd -f
You should configure it so it will allow the MRTG process to access the /var/lib/mrtg folder which could be the cause of the high load for the MTRG process.