Page 1 of 1

ndo2DB errors and IPC

Posted: Fri Aug 24, 2018 10:41 am
by vazudevan
Hey,

We are often noticing ndo2db errors in /var/log/messages.

Code: Select all

ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
the IPC queue at this time are full.

Code: Select all

ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    
0x97010080 1212416    nagios     600        262144000    256000    
here are the IPC settings in /etc/sysctl.cfg

Code: Select all

kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.msgmni = 512000
They get to normal when we clear the IPC queue with for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done and stop/start nagios and ndo2db

This is coming up about every 6 hours or so. How do we handle it?
FYI: Our setup is federated and has 3603 hosts with 15053 services all PASSIVE, MariaDB hosted on a separate server. No loads / cpu / memory contention on the DB server.

Re: ndo2DB errors and IPC

Posted: Fri Aug 24, 2018 12:55 pm
by cdienger
Have any adjustments been made to the check_result_reaper_frequency or max_check_result_reaper_time options in nagios.cfg? You can check these options in the gui under Configure > CCM Admin > Core Configs > General. The defaults are 10 and 30 respectively. Try setting them to 3 and 10 instead to have check results processed more frequently.

Re: ndo2DB errors and IPC

Posted: Fri Aug 24, 2018 2:02 pm
by vazudevan
it was already set to the higher frequency. The condition is in spite of the setting.

Code: Select all

[root@phlprcnagnxi001 etc]# grep reaper nagios.cfg 
# check_result_reaper_frequency=10
check_result_reaper_frequency=3
# max_check_result_reaper_time=30
max_check_result_reaper_time=10

Re: ndo2DB errors and IPC

Posted: Fri Aug 24, 2018 2:49 pm
by cdienger
Does the queue fluctuate at all or does it stay full pretty much all the time once it becomes full? Sometimes it is necessary to increase the queue size beyond even what the kb recommends. This shouldn't be a problem as long as the system isn't displaying other symptoms like not displaying updated check information. I would try doubling the current kernel.msgmnb: https://support.nagios.com/kb/article/n ... d-139.html