Nagios Support Forum

Posted: **Mon Apr 12, 2021 4:32 am**

Hi,

We upgraded nagios XI a few weeks ago from 5.7 to 5.8 (now in 5.8.3). While it worked like a charm then, service nagios monitoring engine now stops sometimes (a few times over the last weeks, that is a lot for monitoring) and I haven't been able to find any cause for it, no error in nagios.log for example.

Where should I look for errors about this service stopping ?

Thanks,

Jeremy

Posted: **Mon Apr 12, 2021 1:46 pm**

Hi Jeremy,

Sorry to hear about the frequent crashes. Please pm me your system profile, to do so:

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message and then reply to this post to bring it up in the queue.

You can look at this document for a list of the logs and what their descriptions:

https://assets.nagios.com/downloads/nagiosxi/docs/Nagios-XI-Log-Locations-And-Descriptions.pdf

How did you know the system went down?
Did the outages occur randomly or at approximately the same time of day/night?
Did any other non-Nagios systems go down aqt the same time?

Thanks

Posted: **Tue Apr 13, 2021 8:36 am**

Hi,

We noticed the crash because we received no more emails from the monitoring service. No other system is impacted (AFAIK). Crash ocurs randomly for me, for the last 30 days we can visualize it with the CPU graph from nagios itslef, as you can see on the graph attached.

No specific errors in the logs other than than the "Caught SIGSEGV" then "Caught SIGTERM". I read it could be a memory leak from a plugin, but that is kind of hard to debug... :-/

Regards

Posted: **Tue Apr 13, 2021 4:57 pm**

Hi,

Please send us the system profile and we'll review the logs for any errors. In the meantime, let's run a tail command on the database log.

Code: Select all

tail /var/log/mariadb

If there are any errors (e.g. crashed database tables), then go ahead and run the repair script as root and let us know if you notice any improvement.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh

Also, do you have test server set up in your environment and have you made any performance modifications to this system? If so, which ones?

Thanks, Benjamin

Posted: **Wed Apr 28, 2021 3:46 am**

Hello,

It was indeed crashed mysql tables, but I had to myisamchk the tables (with mariadb stopped). Now, why Nagios and mariadb are stopped brutally, probably a plugin leaking memory, but that is hard to find...

Regards

Posted: **Wed Apr 28, 2021 1:12 pm**

Hi,

Thanks for the update. The new backend database application may have stopped causing the nagios process to quit. You can keep tabs on the nagios process by running the Nagios Server Wizard on this system.

If you continue to have trouble with corrupt tables, I would recommend converting the tables to innodb.

We have a guide on our knowledgebase on how to do this.

Database Storage Engine and High CPU usage in Nagios XI

Let us know if you need further assistance.

Benjamin

Nagios Support Forum

Nagios monitoring engine stopped

Nagios monitoring engine stopped

Re: Nagios monitoring engine stopped

Re: Nagios monitoring engine stopped

Re: Nagios monitoring engine stopped

Re: Nagios monitoring engine stopped

Re: Nagios monitoring engine stopped