Nagios server getting over load

manimurugesan · Post by **manimurugesan** » Tue Jul 30, 2019 10:10 am

Hello Team,

nagiosxi server is frequently getting overload,i have attached the system profile and PFB screen shot also

Please suggest i guess mrtg will be the issue

TOP command output:

top
top - 15:54:40 up 38 days, 4:51, 2 users, load average: 160.69, 134.33, 135.83
Tasks: 1116 total, 180 running, 936 sleeping, 0 stopped, 0 zombie
Cpu(s): 70.8%us, 28.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 65978484k total, 13197912k used, 52780572k free, 297036k buffers
Swap: 33554428k total, 2916k used, 33551512k free, 10255616k cached

benjaminsmith · Post by **benjaminsmith** » Wed Jul 31, 2019 12:55 pm

Hi @manimurugesan,

Have you made any changes to the server or Apache configurations recently. After looking over the system profile, I see a number of issues.

1. The error your seeing in the uploaded image, specifically "Error: Could not parse XML output from https://server" could be an SSL setting on the server. Follow the document below to make sure all the settings are correct.

How to Configure SSL/TLS

./nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed

2. Run the database repair script, log in as root an run the following command:

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh

3. Post the output of the following command to check the size of your database tables.

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -uroot -pnagiosxi --table

Couldn't resolve host 'api.nagios.com'

4. It can't resolve the connection to the license server, so you may have a DNS issue. What is the output of the following command:

Code: Select all

nslookup api.nagios.com

5. Lastly, run the following to re-start Nagios and clear the message queue.

Code: Select all

service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond star

manimurugesan · Post by **manimurugesan** » Mon Aug 12, 2019 10:17 am

Hello benjamin,

Please find the below output and i have attached output of command to check the size of database tables.

nslookup api.nagios.com
Server: Name server(which server mentioned in /etc/resolve.conf)
Address: Name server
** server can't find api.nagios.com: NXDOMAIN

i did database repair but still server load is showing high .

Could you please suggest what action need to be taken for this ?

benjaminsmith · Post by **benjaminsmith** » Mon Aug 12, 2019 2:27 pm

Hello @manimurugesan,

After further review, you have SELinux enabled on the server and this is preventing mrtg from functioning properly.

Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug 9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd

See: Disabling SELinux

We also noticed that you have gnome installed on the server, and this may result in decreased performance. We recommend a clean, minimal installation for Nagios XI.

Also, you have DNS setup internally and should configure an external DNS. Nagios XI t cannot call out to the licensing server ( api.nagios.com ).

If you continue to experience high load, please run the following top command and post the full output so we can review the processes.

Code: Select all

top -n 1

Thanks.

manimurugesan · Post by **manimurugesan** » Tue Aug 13, 2019 10:20 am

Hello benjamin,

We have tried all the commands which given by you but still issue is persist and i have checked selinux status also it is in disabled state only .PFB output for the same.

# sestatus
SELinux status: disabled

I have attached the top -n 1 command output ,please let us know what action need to be taken from our end ?

Post by **tgriep** » Tue Aug 13, 2019 3:56 pm

The output of the top command shows that the highest 4 processes at that time was the MRTG process so we need to look at that.

Can you run the following commands as root and post the the /tmp/mrtg.txt file here?

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log &> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg &>> /tmp/mrtg.txt
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios &>> /tmp/mrtg.txt
{ time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg 2>1 ; } 2>> /tmp/mrtg.txt

The following entries in the /var/log/messages file

Code: Select all

Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug  9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012*****  Plugin restorecon (94.8 confidence) suggests   ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012*****  Plugin catchall_labels (5.21 confidence) suggests   *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012*****  Plugin catchall (1.44 confidence) suggests   **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: failed to retrieve rpm info for /var/lib/mrtg
Aug  9 10:20:27 MPHSCRLS0739 setroubleshoot: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg. For complete SELinux messages. run sealert -l 64b1ed28-d309-4034-aac4-eaa7bff2a4dd
Aug  9 10:20:27 MPHSCRLS0739 python: SELinux is preventing /usr/bin/perl from write access on the directory /var/lib/mrtg.#012#012*****  Plugin restorecon (94.8 confidence) suggests   ************************#012#012If you want to fix the label. #012/var/lib/mrtg default label should be mrtg_var_lib_t.#012Then you can run restorecon.#012Do#012# /sbin/restorecon -v /var/lib/mrtg#012#012*****  Plugin catchall_labels (5.21 confidence) suggests   *******************#012#012If you want to allow perl to have write access on the mrtg directory#012Then you need to change the label on /var/lib/mrtg#012Do#012# semanage fcontext -a -t FILE_TYPE '/var/lib/mrtg'#012where FILE_TYPE is one of the following: httpd_sys_content_t, mrtg_lock_t, mrtg_log_t, mrtg_var_lib_t, var_lock_t, var_log_t, var_run_t.#012Then execute:#012restorecon -v '/var/lib/mrtg'#012#012#012*****  Plugin catchall (1.44 confidence) suggests   **************************#012#012If you believe that perl should be allowed write access on the mrtg directory by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'mrtg' --raw | audit2allow -M my-mrtg#012# semodule -i my-mrtg.pp#012

Are coming from the setroubleshootd daemon that is running on your server.

Code: Select all

setroub+ 20792     1 13 10:24 ?        00:00:01 /usr/bin/python -Es /usr/sbin/setroubleshootd -f

You should configure it so it will allow the MRTG process to access the /var/lib/mrtg folder which could be the cause of the high load for the MTRG process.

Nagios Support Forum

Nagios server getting over load

Nagios server getting over load

Re: Nagios server getting over load

Re: Nagios server getting over load

Re: Nagios server getting over load

Re: Nagios server getting over load

Re: Nagios server getting over load