recently Nagios XI crashes with info in logs:
Code: Select all
Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad address
Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad address
Jun 14 19:16:45 xxxxx nagios: wproc: iocache_read() from Core Worker 15212 returned -1: Bad addressWe expand Nagios monitoring a lot in last period of time, so maybe it's matter of amount of checks or so...
We made some minor changes as workaround, not sure if it's right direction...
1) changed service nagios.service form type forking with "-d" to simple to allow systemd to handle service in regular way (with options Restart=always RestartSec=30).
2) changed limit OS open files from 10k to 256k
3) plus as typical workaround, until some real solution will be applied, we build some "self-healing" service to restart nagios.service, when nagios fails in "hard way"
Nagios XI and OS (RHEL 7) is up to date.
Darek.