ndo2db eventually fails

onegative · Post by **onegative** » Mon May 20, 2019 7:08 pm

G 'Day Support,

I am having an issue with my Nagios XI installation...cannot seem to figure out what is going on. Recently implemented nagiosramdisk via your script and all seems to have been working well. I found it is a bad state this morning along with checkresults directory with hundreds of thousands of files. Obvious not processing working. I've tried everything I can think of and I cannot get the system stabilized without eventually cratering.

It starts out with the "Monitoring Engine Event Queue" basically not showing any activity and then this error shows up, "May 20 17:03:25 dcom-nagiosxi-p1 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 512000 of 512000 messages and 524288000 of 524288000 bytes in the queue. See README for kernel tuning options."
I really need help in determining how to troubleshoot this issue...first time I have encountered...I increased the kernel parameters but that just prolongs the time until the next failure...then I have to restart ndo2db and we are back to square one...please advise and thanks...

Danny

onegative · Post by **onegative** » Tue May 21, 2019 9:20 am

Well I took my latest backup and brought XI up on my DR instance and it is running as expected...so I figure the configuration itself is ok...trying to uninstall everything on my Primary server and then re-install. Concerned about how to remove the nagiosramdisk...or will the uninstall script perform that work as well??? Let me know and thanks...

Any suggestions?

Danny

Post by **cdienger** » Tue May 21, 2019 2:34 pm

What version of XI is this and what OS version is it installed on? It looks like the uninstall script will take care of it but I would still recommend reviewing the manual steps in https://assets.nagios.com/downloads/nag ... giosXI.pdf afterwards to make sure they all get removed.

When/if you reinstall the ramdisk, monitor the system closely at first to make sure files are not queuing up and if you do notice it start to queue up, get a profile from the system so we can review. It would also be good to gather any files that the ramdisk script touches - see the PDF for OS specific files.

onegative · Post by **onegative** » Wed May 29, 2019 12:57 pm

@cdienger

So I started investigating this more deeply and determined that the kernel message queue was hitting its limits. I see that the mysql database was utilizing too much CPU to stay up with the amount of messages being sent. So I modified the ipcs kernel settings as recommended but that wasn't enough...I therefore enabled backendcache which appears to have help...I am still concerned that the system is overloaded...

Resources available:

# uname -a
Linux dcom-nagiosxi-p3.xxxx.xxxxxxxxxx.xxx 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# top
top - 10:53:34 up 28 days, 23:03, 2 users, load average: 3.94, 3.55,
Tasks: 439 total, 2 running, 436 sleeping, 0 stopped, 1 zombie
%Cpu(s): 11.9 us, 2.9 sy, 0.0 ni, 85.0 id, 0.0 wa, 0.0 hi, 0.2 si,
KiB Mem : 16266124 total, 2233284 free, 2126932 used, 11905908 buff/c
KiB Swap: 2097148 total, 1243928 free, 853220 used. 13410624 avail

# grep MemTotal /proc/meminfo
MemTotal: 16266124 kB

# grep processor /proc/cpuinfo | wc -l
8

# df -kh /var/nagiosramdisk/
Filesystem Size Used Avail Use% Mounted on
tmpfs 1000M 152M 849M 16% /var/nagiosramdisk

System_Status_Statistics.png

Monitoring_Engine_Status_Statistics.png

Does anyone have any suggested limitations on the number of monitoring objects that a single instance of Nagios XI can manage?

Let me know and thanks,
Danny

Post by **cdienger** » Wed May 29, 2019 4:10 pm

30,000 total checks is usually a pretty good cut off but the number can vary widely depending on the frequency of checks, the types of checks, tweaks, etc...https://assets.nagios.com/downloads/nag ... ios-XI.pdf covers the tweaking possibilities. In my own testing, I found the change to the reaper settings and lowering the frequency of the checks to be pretty beneficial.

onegative · Post by **onegative** » Fri May 31, 2019 12:55 pm

@cdienger

Well I think I have finally tweaked it enough to keep it running. I ended up creating the /var/nagiosramdisk and /usr/local/nagiosxi/tmp/backendcache ramdisks...

Currently the system is operating as expect with regard to processing events...seems to be acceptable...you can lock this entry.

Thanks for your help/input,
Danny

scottwilkerson · Post by **scottwilkerson** » Fri May 31, 2019 1:52 pm

onegative wrote:@cdienger

Well I think I have finally tweaked it enough to keep it running. I ended up creating the /var/nagiosramdisk and /usr/local/nagiosxi/tmp/backendcache ramdisks...

Currently the system is operating as expect with regard to processing events...seems to be acceptable...you can lock this entry.

Thanks for your help/input,
Danny

great!

Locking

Nagios Support Forum

ndo2db eventually fails

ndo2db eventually fails

Re: ndo2db eventually fails

Re: ndo2db eventually fails

Re: ndo2db eventually fails

Re: ndo2db eventually fails

Re: ndo2db eventually fails

Re: ndo2db eventually fails