Nagios Support Forum

Posted: **Fri Dec 06, 2019 4:45 pm**

I have now logged in a couple of times to my monitoring instance and found that the "Montioring Engine Process" is stopped even though I am not the one stopping the engine.

Does Nagios auto-shutdown if load is too high?
What are the possibile scenarios?

Posted: **Fri Dec 06, 2019 5:23 pm**

No auto-shutdown inherently built in, are you seeing any segfaults or anything in your /usr/local/nagios/var/nagios.log?

Please PM a copy of your profile, you can download it from Admin > System Profile > Download Profile.

As root, please send the output of these commands (hopefully when it's already stopped on it's own and you've taken no corrective action):

Code: Select all

ps aux
ipcs -q

Please include the output of these commands as well (run as root):

Code: Select all

sysctl -p
ulimit -a
chage -l nagios
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql

If you have these files, please attach:

Code: Select all

/etc/init.d/npcd
/etc/init.d/nagios

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

Then run this command:

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l

If it outputs the number 2, run the command below as well and include the output, if it outputs anything other than 2 - don't run the command. (some XI systems use both mysql and postgresql if they were install prior to XI 5.0 and then upgraded from there).

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi

Posted: **Mon Dec 09, 2019 11:04 am**

Profile received, please send all the rest of the information requested as well. The only things I'd like to see when the issue is occurring is these:

Code: Select all

ps aux
ipcs -q
top -n3
df -h
df -i
tail /var/log/mariadb/mariadb.log

The rest of the information shouldn't change and could point us in a direction sooner so please send it/attach the files.

Looking at your profile I'm wondering if it was your ramdisk that filled up, grab the output above once it occurs and BEFORE you do any remediation.

Additionally, do you see anything in /var/log/mariadb/mariadb.log now that could indicate an issue?

Thank you!

Posted: **Tue Dec 10, 2019 11:23 am**

I don't have it in an error state currently.
as soon as I catch it again, I will be glad to do this.

I did find my test instance of nagios failed this morning though. Attached is a screenshot showing the eventlog catching a "SIGTERM"

I will send you the profile for that machine, as well as the output of the commands you specified in PM.

Thank you.

Posted: **Tue Dec 10, 2019 4:57 pm**

It will probably be best to keep the two machines separate, because we don't know that the two instances are failing for the same reason. We'll keep this thread open and wait to hear back on the original instance of Nagios.

Posted: **Thu Dec 12, 2019 5:06 pm**

Please send the output of these commands now though:

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
sysctl -p
ulimit -a
chage -l nagios
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql

Only provide these if it occurs again (before any remediation):

Code: Select all

ps aux
ipcs -q
top -n3
df -h
df -i
tail /var/log/mariadb/mariadb.log

Posted: **Fri Dec 20, 2019 5:20 pm**

I need both instances to get attention. The 2nd instance exhibits the problem more often than the primary instance. (primary instance has not shut down since this ticket was opened) Can you please still respond to the data I provided relating to the 2nd instance?

Posted: **Mon Dec 23, 2019 12:49 pm**

Absolutely. If the 2nd instance is the larger offender, let's tackle that one. ssax posted some information gathering commands above. Can you check out his post and let us know the results?

Posted: **Thu Jan 02, 2020 10:11 am**

I believe I figured out what the problem was. After performing OS updates, my Linux team would reboot the server.
After reboot, the Nagios Monitoring Engine would be in a stopped state.

It appears that the default setting is to NOT start up the nagios engine at boot.
After running "systemctl enable nagios" things work as expected after a reboot.

Posted: **Thu Jan 02, 2020 10:35 am**

Excellent, I'm glad you were able to get this resolved! Also thank you for posting the solution back to the forums! I will go ahead and close this thread.

Nagios Support Forum

Nagios Monitoring Engine Stops On Its Own

Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own

Re: Nagios Monitoring Engine Stops On Its Own