Page 1 of 1

Nagios crash - iocache_read()...Bad address

Posted: Mon Jun 15, 2020 4:38 am
by dariusz.nalazek
Hello,

recently Nagios XI crashes with info in logs:

Code: Select all

Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad address
Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad address
Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad address
It's no cause of last upgrade to 5.7.x it start a few days ago before upgrade (@5.6.14), and upgrade to 5.7.1 didn’t solve it...
We expand Nagios monitoring a lot in last period of time, so maybe it's matter of amount of checks or so...

We made some minor changes as workaround, not sure if it's right direction...
1) changed service nagios.service form type forking with "-d" to simple to allow systemd to handle service in regular way (with options Restart=always RestartSec=30).
2) changed limit OS open files from 10k to 256k
3) plus as typical workaround, until some real solution will be applied, we build some "self-healing" service to restart nagios.service, when nagios fails in "hard way"

Nagios XI and OS (RHEL 7) is up to date.


Darek.

Re: Nagios crash - iocache_read()...Bad address

Posted: Mon Jun 15, 2020 3:45 pm
by benjaminsmith
Hi Darek,

That's an error message coming from the monitoring engine. How often do you have to re-start the nagios service to clear the issue? Also, I'd like to review the logs in the system profile to help troubleshoot the issue.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Thank you,
Benjamin

Re: Nagios crash - iocache_read()...Bad address

Posted: Tue Jun 16, 2020 5:29 am
by dariusz.nalazek
3 times in short period of time. (08.06, 10.06, 14.06)
Never happened before.

We had to roll back our server to date 12.06 yesterday, cos of some issue with 5.7.1 and BPI monitoring, but it's different topic.
So logs on Nagios XI server can be inconsistent, lucky all logs we have in Nagios LS for debugging purpose, in case of such need...
Profile send on PM.

Darek.

Re: Nagios crash - iocache_read()...Bad address

Posted: Tue Jun 16, 2020 5:13 pm
by ssax
PHP Fatal error: Call to undefined function get_backend_xml_data() in /usr/local/nagiosxi/html/includes/components/historytab/historytab_do_stuff.php on line 49
The historytab component doesn't work in XI 5.7+.

You can remove the component:

Code: Select all

rm -rf /usr/local/nagiosxi/html/includes/components/historytab
Or edit this file:

Code: Select all

/usr/local/nagiosxi/html/includes/components/historytab/historytab_do_stuff.php
Comment out line 49 to stop it from failing on the comments:

Code: Select all

#$xml_nagios_comments = get_backend_xml_data($args_nagios_comments);
I'm not sure if chanig the nagios.service unit file from forking will allow it to work properly during apply configurations/etc when the nagios service restarts, you may want to test that.

Did you see any improvement when you increased the open limits?

I'm wondering if going from 5.6.14 directly to 5.7.1 (skipping 5.7.0) would resolve that issue.

I'm not really seeing anything else that stands out from your profile.

Re: Nagios crash - iocache_read()...Bad address

Posted: Mon Jun 22, 2020 8:02 am
by dariusz.nalazek
so far is OK, and service is set to forking again, the simple somehow was unstable...

Darek.

Re: Nagios crash - iocache_read()...Bad address

Posted: Mon Jun 22, 2020 5:49 pm
by ssax
Ok, glad it's stable, keep an eye on it and let us know when we're okay to lock this thread up and mark it as resolved.

Thank you!