nagios: wproc: iocache_read() ...returned -1: Bad address

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
idemia-cl
Posts: 13
Joined: Fri Jun 12, 2020 12:19 pm

nagios: wproc: iocache_read() ...returned -1: Bad address

Post by idemia-cl »

Right now I have the same problem as the one reported here https://support.nagios.com/forum/viewto ... 6&p=305969
Apply the same settings suggested here https://support.nagios.com/kb/article.php?id=139, without correcting the problem.
Was the solution found in ticket 820717?
idemia-cl
Posts: 13
Joined: Fri Jun 12, 2020 12:19 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by idemia-cl »

What version of Nagios XI are you using?
5.6.6

Linux Distribution and version?
CentOS 6.10

32 or 64bit?
64 bit

VMware Image or Manual Install of XI?
Manual

Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
No

**If you are encountering multiple issues that may not be related, start a thread for each issue
User avatar
jbrunkow
Posts: 441
Joined: Fri Mar 13, 2020 10:45 am

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by jbrunkow »

Yes, we were able to resolve that topic in the subsequent ticket that was created for it. It appears that—after some fine tuning of some of the kernel message queue settings in /etc/sysctl.conf, setting max_concurrent_checks to 0 in /usr/local/nagios/etc/nagios.cfg, and running the /usr/local/nagiosxi/scripts/repair_databases.sh script—they
...found that the issue was at the VM Host level. The resources were being pulled by other systems causing a strain at random times and causing memory issues. Once the VM configuration was updated to remove the memory limitation and cpu limitation, Nagios XI has been stable...
It is not exactly clear which one of those steps resolved the problem for them.

You are having stability issues, or are you getting errors/warnings about ndo2db in your /var/log/messages?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
idemia-cl
Posts: 13
Joined: Fri Jun 12, 2020 12:19 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by idemia-cl »

Hi. We have stability issues. Nagios XI stops monitoring at any time.
What has been done for the moment is to apply the settings suggested here https://support.nagios.com/kb/article.php?id=139, without correcting the problem.

This is the error:
-------------------------------------------------------------------------------------------------------------------------------------------
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:43 s1t2mon01 rsyslogd-2177: imuxsock begins to drop messages from pid 54348 due to rate-limiting
Jun 16 11:14:43 s1t2mon01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Jun 16 11:14:44 s1t2mon01 ndo2db: Message sent to queue.
Jun 16 11:14:44 s1t2mon01 ndo2db: Warning: queue send error, retrying...
Jun 16 11:14:44 s1t2mon01 ndo2db: Message sent to queue.
Jun 16 11:14:49 s1t2mon01 rsyslogd-2177: imuxsock lost 49417 messages from pid 54348 due to rate-limiting
Jun 16 11:14:49 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
Jun 16 11:14:49 s1t2mon01 nagios: wproc: iocache_read() from Core Worker 54352 returned -1: Bad address
-------------------------------------------------------------------------------------------------------------------------------------------

The parameter max_concurrent_checks is currently 0
I have not yet executed /usr/local/nagiosxi/scripts/repair_databases.sh script


Please your help for the steps to follow
User avatar
jbrunkow
Posts: 441
Joined: Fri Mar 13, 2020 10:45 am

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by jbrunkow »

It appears that your message queue is overloading.

Please send me a system profile from your XI system so that I can make more informed suggestions about this system.

Code: Select all

sh /usr/local/nagiosxi/scripts/components/getprofile.sh
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by ssax »

Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile button.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Then run this command:

Code: Select all

grep mysql /usr/local/nagiosxi/html/config.inc.php | wc -l
If it outputs the number 2, run the command below as well and include the output, if it outputs anything other than 2 - don't run the command. (some XI systems use both mysql and postgresql if they were install prior to XI 5.0 and then upgraded from there).

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
idemia-cl
Posts: 13
Joined: Fri Jun 12, 2020 12:19 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by idemia-cl »

Hi,

I sent you the requested files via PM.
Thank you
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by ssax »

Is this a physical system or a VM?

That info looks proper.

Please send me the output of these commands as root:

Code: Select all

sysctl -p
ulimit -a
su -s /bin/bash -c 'ulimit -a' mysql
su -s /bin/bash -c 'ulimit -a' nagios
dmesg
Please attach these files as well:

Code: Select all

/etc/php.ini
/etc/my.cnf
/etc/security/limits.conf
/etc/security/limits.d/*         <- Include all files from here
Thank you
idemia-cl
Posts: 13
Joined: Fri Jun 12, 2020 12:19 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by idemia-cl »

Hi,

Nagios XI is a VM (ESX 6.0).
Attached the requested information.

Thank you
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios: wproc: iocache_read() ...returned -1: Bad addres

Post by ssax »

Please edit this file:

Code: Select all

/etc/security/limits.d/90-nproc.conf
And change this:

Code: Select all

*          soft    nproc     1024
To this:

Code: Select all

*          soft    nproc     4096
Then edit this file:

Code: Select all

/etc/sysctl.conf 
Change these:

Code: Select all

kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.msgmni = 512000
To these:

Code: Select all

kernel.msgmnb = 524288000
kernel.msgmax = 524288000
kernel.msgmni = 1000000
Then reboot the system, wait 15 minutes, then send me a fresh copy of your profile and fresh output of this command:

Code: Select all

dmesg
Let us know if it's running any better after those changes.


Thank you!
Locked