nagios: wproc: iocache_read() ...returned -1: Bad address

idemia-cl · Post by **idemia-cl** » Sun Jun 14, 2020 9:10 pm

Right now I have the same problem as the one reported here https://support.nagios.com/forum/viewto ... 6&p=305969
Apply the same settings suggested here https://support.nagios.com/kb/article.php?id=139, without correcting the problem.
Was the solution found in ticket 820717?

idemia-cl · Post by **idemia-cl** » Sun Jun 14, 2020 9:13 pm

What version of Nagios XI are you using?
5.6.6

Linux Distribution and version?
CentOS 6.10

32 or 64bit?
64 bit

VMware Image or Manual Install of XI?
Manual

Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
No

**If you are encountering multiple issues that may not be related, start a thread for each issue

jbrunkow · Post by **jbrunkow** » Mon Jun 15, 2020 3:54 pm

Yes, we were able to resolve that topic in the subsequent ticket that was created for it. It appears that—after some fine tuning of some of the kernel message queue settings in /etc/sysctl.conf, setting max_concurrent_checks to 0 in /usr/local/nagios/etc/nagios.cfg, and running the /usr/local/nagiosxi/scripts/repair_databases.sh script—they

...found that the issue was at the VM Host level. The resources were being pulled by other systems causing a strain at random times and causing memory issues. Once the VM configuration was updated to remove the memory limitation and cpu limitation, Nagios XI has been stable...

It is not exactly clear which one of those steps resolved the problem for them.

You are having stability issues, or are you getting errors/warnings about ndo2db in your /var/log/messages?

idemia-cl · Post by **idemia-cl** » Tue Jun 16, 2020 11:26 am

Hi. We have stability issues. Nagios XI stops monitoring at any time.
What has been done for the moment is to apply the settings suggested here https://support.nagios.com/kb/article.php?id=139, without correcting the problem.

This is the error:
-------------------------------------------------------------------------------------------------------------------------------------------
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 rsyslogd-2177: imuxsock begins to drop messages from pid 54348 due to rate-limiting
Jun 16 11:14:43 s1t2mon01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Jun 16 11:14:44 s1t2mon01 ndo2db: Message sent to queue.
Jun 16 11:14:44 s1t2mon01 ndo2db: Warning: queue send error, retrying...
Jun 16 11:14:44 s1t2mon01 ndo2db: Message sent to queue.
Jun 16 11:14:49 s1t2mon01 rsyslogd-2177: imuxsock lost 49417 messages from pid 54348 due to rate-limiting
Jun 16 11:14:49 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:49 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
-------------------------------------------------------------------------------------------------------------------------------------------

The parameter max_concurrent_checks is currently 0
I have not yet executed /usr/local/nagiosxi/scripts/repair_databases.sh script

Please your help for the steps to follow

jbrunkow · Post by **jbrunkow** » Tue Jun 16, 2020 4:43 pm

It appears that your message queue is overloading.

Please send me a system profile from your XI system so that I can make more informed suggestions about this system.

Code: Select all

sh /usr/local/nagiosxi/scripts/components/getprofile.sh

ssax · Post by **ssax** » Tue Jun 16, 2020 4:52 pm

Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile button.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table

Then run this command:

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l

If it outputs the number 2, run the command below as well and include the output, if it outputs anything other than 2 - don't run the command. (some XI systems use both mysql and postgresql if they were install prior to XI 5.0 and then upgraded from there).

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi

idemia-cl · Post by **idemia-cl** » Wed Jun 17, 2020 8:40 am

Hi,

I sent you the requested files via PM.
Thank you

ssax · Post by **ssax** » Wed Jun 17, 2020 3:26 pm

Is this a physical system or a VM?

That info looks proper.

Please send me the output of these commands as root:

Code: Select all

sysctl -p
ulimit -a
su -s /bin/bash -c 'ulimit -a' mysql
su -s /bin/bash -c 'ulimit -a' nagios
dmesg

Please attach these files as well:

Code: Select all

/etc/php.ini
/etc/my.cnf
/etc/security/limits.conf
/etc/security/limits.d/*         <- Include all files from here

Thank you

idemia-cl · Post by **idemia-cl** » Thu Jun 18, 2020 11:41 am

Hi,

Nagios XI is a VM (ESX 6.0).
Attached the requested information.

Thank you

ssax · Post by **ssax** » Thu Jun 18, 2020 5:09 pm

Please edit this file:

Code: Select all

/etc/security/limits.d/90-nproc.conf

And change this:

Code: Select all

*          soft    nproc     1024

To this:

Code: Select all

*          soft    nproc     4096

Then edit this file:

Code: Select all

/etc/sysctl.conf

Change these:

Code: Select all

kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.msgmni = 512000

To these:

Code: Select all

kernel.msgmnb = 524288000
kernel.msgmax = 524288000
kernel.msgmni = 1000000

Then reboot the system, wait 15 minutes, then send me a fresh copy of your profile and fresh output of this command:

Code: Select all

dmesg

Let us know if it's running any better after those changes.

Thank you!

Nagios Support Forum

nagios: wproc: iocache_read() ...returned -1: Bad address

nagios: wproc: iocache_read() ...returned -1: Bad address

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres