chicjo01 wrote:
So my guess would be after all is said and done with both windows and linux.
Hosts: 2192
Services: 40000
Total: 42192 ballpark
Generally speaking Nagios XI has bottleneck issues once you go past 20,000 objects. If your system is running out of RAM or CPU then it can cause lots of issues.
We are currently in the progress of developing some KB articles to explain the exact details of where the specific bottlenecks can occur and how to help, this might not be available for a couple of months yet.
Honestly I would be looking at implementing 3 x Nagios XI servers to break things up, it will give you a more stable and reliable monitoring solution.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Is this bottleneck related to only active checks or does it also include passive checks or if you offload the service check to remote servers using DNX or something like that? Since you are currently writing up the KB Articles, this means you may have solutions already for these bottlenecks. If you do, can you let me know what these are?
Going back to the original problem. Do you have any suggestions on what can be done to correct this or know why it is happening?
You are hitting a Core/NDOUtils bug that has been fixed in later versions, I talked with our C developer and he recommends that you upgrade to the just released versions:
They BOTH need to be upgraded to work properly, if you need help with this it would best be handled in a ticket, please send in an email to [email protected] with a descriptive subject and detailed body with a link back to this thread and we can go from there.
Thank you for the information. Going back the original problem, what additional information do you need from me? or are you saying the problem I am having with "Unable to run check for service" is because of this bottleneck?
The command given before, did not fix the issue. If I upgrade the specific components (Nagios Core / Ndoutils), will that cause my Nagios XI to have problem when I need to update that?
service nagios stop
killall -9 nagios
service ndo2db stop
service mysqld restart
rm -rf /usr/local/nagios/var/rw/nagios.cmd
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios start
Upgrading Core and Ndoutils should helpout in fixing the kernel message queue issue you are having.
If you do upgrade the XI server software, it would downgrade core and ndoutils.
There are other fixes you can try and if you post the following file, I can give you some suggestions.
I will work to get the two components updated. Will have to submit a change control for it. Are there any plans to integrate the components into NagiosXI?