Page 1 of 1

NDO2DB Kernel Queue issue

Posted: Thu Dec 02, 2021 1:23 pm
by vaadaud
Hi Team,

We have strange issue related to Kernel queue. We are using NDO2DB and increased the messages limit but still have the issue. No other error or any service issue but suddenly monitoring getting stop. Please suggest. (NDO3 had the issue so degraded to NDO2DB before 6months)

Error:
ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
ndo2db: Warning: queue send error, retrying...

Present Limit: (tested with changing the numbers)
kernel.msgmnb = 99662144000
kernel.msgmax = 99662144000
kernel.shmmax = 42949672950
kernel.shmall = 2684354560
kernel.msgmni = 91120000

------ Message Queues --------
key msqid owner perms used-bytes messages
0xb8000002 7897089 nagios 600 877895680 857320

Nagios user limit:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 160158
max locked memory (kbytes, -l) 128
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 20480
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Kernel:
3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.7 (Maipo)
PHP 5.4.16 (cli) (built: Jun 19 2018 13:09:01)

IO:
avg-cpu: %user %nice %system %iowait %steal %idle
10.16 0.00 3.42 0.07 0.00 86.35

XI Details:
Version 5.8.5, hosts=899, services=9658, DB=offloaded, HW=no resource crunch, No other OS/DB log/error

Thanks,
Vaibhav

Re: NDO2DB Kernel Queue issue

Posted: Fri Dec 03, 2021 10:54 am
by benjaminsmith
Hi Vaibhav,

Thanks for contacting the support team at Nagios.

Normally, increasing those limits is very helpful. What is the total check load of this server, host + service checks?

Try performing a full restart and clear the message queues.

Code: Select all

systemctl stop nagios
systemctl stop ndo2db
systemctl stop crond
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then systemctl stop postgresql; fi;
systemctl restart mariadb
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
rm -f /usr/local/nagiosxi/tmp/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill python
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
If that does not resolve the issue, please send us a current system profile for us to review. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" butto

Re: NDO2DB Kernel Queue issue

Posted: Mon Dec 06, 2021 4:21 pm
by vaadaud
Hi,

We have already performed this steps also rebooted the servers but same issue. We can not share the profile due to security restriction.

Thanks,
Vaibhav

Re: NDO2DB Kernel Queue issue

Posted: Mon Dec 06, 2021 4:38 pm
by benjaminsmith
HI Vaibhav,

Due to security restrictions, let's open a ticket for this issue as it will make it easier to privately share log files.

To open a ticket, please visit:

https://support.nagios.com/tickets

Please provide the following information in the ticket:

Code: Select all

ipcs -a
top -b -n 1
tail -n 500 /usr/local/nagios/var/nagios.log
# Database log
/var/log/mariadb/mariadb.log
/var/log/mysqld.log
Thanks,
Benjamin

Re: NDO2DB Kernel Queue issue

Posted: Mon Dec 06, 2021 4:42 pm
by vaadaud
Ticket is already opened for this issue by Luca

Re: NDO2DB Kernel Queue issue

Posted: Mon Dec 06, 2021 4:59 pm
by benjaminsmith
Hi,

Okay, would that be ticket #382026?

Re: NDO2DB Kernel Queue issue

Posted: Tue Dec 07, 2021 7:27 am
by vaadaud
yes

Re: NDO2DB Kernel Queue issue

Posted: Tue Dec 07, 2021 4:25 pm
by benjaminsmith
Hi,

Okay, we'll close out this forum post then, and we'll move this over to that ticket to avoid duplicates.

Thanks,
Benjamin