Page 1 of 3

Getting frequent Swap and Memory warning/critical message

Posted: Sat Oct 24, 2020 2:34 am
by pratikmehta003
server stats.PNG
free cmd.PNG
Hi Team,
Getting frequent Swap and Memory warning/critical message for Nagios XI VM. And often the monitoring engine status shows in Red color. When we click start in the console for that Monitoring engine status, then it turns to green.. but not sure why this is happening...

attached output of free command..

we are monitoring 33 SAN switches and services are 3400... attached screenshot servers statistics too

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Mon Oct 26, 2020 3:26 pm
by benjaminsmith
Hi,

Thanks for the screenshot. A couple of issues here to work out, the monitoring engine stopping and they swap space. Let's resolve the error with the monitoring engine stopping before resolving the swap space as that is more critical.

1. How often does this occur? Is there any correlation between memory usage and the monitoring engine stopping?

2. If you haven't done so already, please run the Nagios XI Server Wizard so you'll receive a notification if there's an issue with any of the required services.

3. Lastly, please send over a system profile and I can check the logs for errors. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Wed Oct 28, 2020 5:18 am
by pratikmehta003
sure, will send the details soon..

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Wed Oct 28, 2020 4:45 pm
by benjaminsmith
sure, will send the details soon..
Sounds good. Please reply to the thread once you send over the profile.

Thanks,
Benjamin

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Thu Oct 29, 2020 12:52 pm
by pratikmehta003
Hi Benjamin,

Have sent the info privately.. i am also seeing some other errors now for other components.. we had got the server rebooted and i have restarted again nagios service today.... something seems to be really wrong...

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Fri Oct 30, 2020 1:33 pm
by benjaminsmith
Hi,

I got your message. Can you send the entire profile.zip file over? It has the full set of logs and configurations for troubleshooting system issues. If you're not able to download it from the web interface using the following steps:

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Then go run the following commands to generate it from the command line.

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh​ SUPPORT
Then send or attach the resulting /usr/local/nagiosxi/var/components/profile.zip​ file.

Lastly, if you're experiencing critical issues with your production server, then I would recommend opening a support ticket for faster resolution.

https://support.nagios.com/tickets/

In the meantime time, please try doing a full restart of the system and let us know if you notice any improvement.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start nagios
systemctl start npcd
systemctl start crond

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Mon Nov 02, 2020 5:05 am
by pratikmehta003
Hi Benjamin,
From the console its not allowing to download the profile..

do i need to run below 2 commands to get the same output? I hope it doesnt make any changes as this is a Live instance and devices are getting monitored:
rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh​ SUPPORT

I will check with customer on full restart part since its in Prod..

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Mon Nov 02, 2020 12:21 pm
by benjaminsmith
Hi,
From the console its not allowing to download the profile..
What error message are your getting? Typically, this is the result of an incorrect sudoers file, which can cause other issues as well, take a look at the following KB article, and let me know if that's the issue.

Nagios XI - Profile Build Failed

Those commands will just remove the old profile(s) saved on the system and then generate a new one. It won't make any changes to the server. You can just run this one command as well and get the latest profile from the directory (usr/local/nagiosxi/var/components/)

Code: Select all

/usr/local/nagiosxi/scripts/components/getprofile.sh​ SUPPORT

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Wed Nov 04, 2020 8:09 am
by pratikmehta003
HI Benjamin,

Yes m getting same error as shown in KB...

This customer where we have this running, has pretty strict rules... so is it possible to know what edits will be needed in sudoers?
If i give an extract and if u can have a look at it...

Re: Getting frequent Swap and Memory warning/critical messag

Posted: Wed Nov 04, 2020 5:08 pm
by benjaminsmith
Hi,
This customer where we have this running, has pretty strict rules... so is it possible to know what edits will be needed in sudoers?
If i give an extract and if u can have a look at it.
Sounds good. You can post it to the thread or send it over in a PM.

The default sudoers should be as follows:

Code: Select all

## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)
#includedir /etc/sudoers.d
User_Alias      NAGIOSXI=nagios
User_Alias      NAGIOSXIWEB=apache
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios status
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios checkconfig
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd status
NAGIOSXI ALL = NOPASSWD:/usr/bin/php /usr/local/nagiosxi/scripts/components/autodiscover_new.php *
NAGIOSXI ALL = NOPASSWD:/usr/bin/php /usr/local/nagiosxi/scripts/send_to_nls.php *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/components/getprofile.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/upgrade_to_latest.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/change_timezone.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/reset_config_perms.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_ssl_config.sh *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/backup_xi.sh *
NAGIOSXIWEB ALL = NOPASSWD:/etc/init.d/snmptt restart
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/messages
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/httpd/error_log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/mysqld.log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/php /usr/local/nagiosxi/scripts/components/autodiscover_new.php *
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/components/getprofile.sh
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/repair_databases.sh
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *