Page 1 of 1
Nagios monitoring engine stopped
Posted: Mon Apr 12, 2021 4:32 am
by jmonn
Hi,
We upgraded nagios XI a few weeks ago from 5.7 to 5.8 (now in 5.8.3). While it worked like a charm then, service nagios monitoring engine now stops sometimes (a few times over the last weeks, that is a lot for monitoring) and I haven't been able to find any cause for it, no error in nagios.log for example.
Where should I look for errors about this service stopping ?
Thanks,
Jeremy
Re: Nagios monitoring engine stopped
Posted: Mon Apr 12, 2021 1:46 pm
by gsmith
Hi Jeremy,
Sorry to hear about the frequent crashes. Please pm me your system profile, to do so:
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message and then reply to this post to bring it up in the queue.
You can look at this document for a list of the logs and what their descriptions:
https://assets.nagios.com/downloads/nagiosxi/docs/Nagios-XI-Log-Locations-And-Descriptions.pdf
How did you know the system went down?
Did the outages occur randomly or at approximately the same time of day/night?
Did any other non-Nagios systems go down aqt the same time?
Thanks
Re: Nagios monitoring engine stopped
Posted: Tue Apr 13, 2021 8:36 am
by jmonn
Hi,
We noticed the crash because we received no more emails from the monitoring service. No other system is impacted (AFAIK). Crash ocurs randomly for me, for the last 30 days we can visualize it with the CPU graph from nagios itslef, as you can see on the graph attached.
No specific errors in the logs other than than the "Caught SIGSEGV" then "Caught SIGTERM". I read it could be a memory leak from a plugin, but that is kind of hard to debug... :-/
Regards
Re: Nagios monitoring engine stopped
Posted: Tue Apr 13, 2021 4:57 pm
by benjaminsmith
Hi,
Please send us the system profile and we'll review the logs for any errors. In the meantime, let's run a tail command on the database log.
If there are any errors (e.g. crashed database tables), then go ahead and run the repair script as root and let us know if you notice any improvement.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Also, do you have test server set up in your environment and have you made any performance modifications to this system? If so, which ones?
Thanks, Benjamin
Re: Nagios monitoring engine stopped
Posted: Wed Apr 28, 2021 3:46 am
by jmonn
Hello,
It was indeed crashed mysql tables, but I had to myisamchk the tables (with mariadb stopped). Now, why Nagios and mariadb are stopped brutally, probably a plugin leaking memory, but that is hard to find...
Regards
Re: Nagios monitoring engine stopped
Posted: Wed Apr 28, 2021 1:12 pm
by benjaminsmith
Hi,
Thanks for the update. The new backend database application may have stopped causing the nagios process to quit. You can keep tabs on the nagios process by running the Nagios Server Wizard on this system.
If you continue to have trouble with corrupt tables, I would recommend converting the tables to innodb.
We have a guide on our knowledgebase on how to do this.
Database Storage Engine and High CPU usage in Nagios XI
Let us know if you need further assistance.
Benjamin