Number of "nagios.cfg" instances failing.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
luczynj
Posts: 88
Joined: Wed Dec 03, 2014 6:47 pm

Number of "nagios.cfg" instances failing.

Post by luczynj »

Hello all,

We are running Nagios XI 5.6.5: CentOS Linux nagios-b 2.6.32-754.17.1.el6.centos.plus.x86_64 #1 SMP Tue Jul 2 20:09:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Due to the massive growth and use of Nagios XI, we recently replaced our servers and added 64GB RAM to these new servers.

We have 326 hosts and nearly 29,000 services.

Nagios XI was very stable until we hit around 25K services. I have no idea what to do or even where to troubleshoot.

I have a script that checks that the output of "ps -eaf | grep nagios.cfg | grep -v grep" is equal to 2. If not, it restarts the nagios service.

[root@nagios-b ]# free
total used free shared buffers cached
Mem: 66067412 59720752 6346660 70844 601948 56079904
-/+ buffers/cache: 3038900 63028512
Swap: 33038332 22436 33015896


Please help!

Regards,
JLu
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Number of "nagios.cfg" instances failing.

Post by lmiltchev »

There are number of things you could do in order to "tune up" your Nagios XI system, and improve performance. You can find information on the topic here:

https://assets.nagios.com/downloads/nag ... p#boosting

Having said that, with so many services, you should really consider purchasing a second Nagios XI server. Tweaking the settings, and throwing more hardware at the server can only go so far.
Be sure to check out our Knowledgebase for helpful articles and solutions!
luczynj
Posts: 88
Joined: Wed Dec 03, 2014 6:47 pm

Re: Number of "nagios.cfg" instances failing.

Post by luczynj »

Thanks for the recommendation. However, we have had similar issues since we started using Nagios in 2014, and we did all the fine-tuning that we could. We've just deployed two new beefed up servers to run this Nagios. A large bulk of the 29Kservices are passive.

We ended up brainstorming about this issue and discovered that we were getting flodded by a node in Amsterdam. We experienced this a year or two ago and found we were getting thousands of SNMP messages per hour and some by minute. And Nagios couldn't cope with that. I doubt anyone's hardware could handle that.

Is there a custom service out there that could detect this kind of flooding? I've already got a script in my head of how I would write my own. By why reinvent the wheel?

Thanks again for your help. Now that we've updated the the IP tables to prevent future flooding, the system is running just fine on the new hardware.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Number of "nagios.cfg" instances failing.

Post by lmiltchev »

I makes a big difference that the majority of your checks are passive. In this case, you may be fine, especially with the new hardware.
Is there a custom service out there that could detect this kind of flooding? I've already got a script in my head of how I would write my own. By why reinvent the wheel?
I don't think there is anything, included in XI that would help you with that. You are on the right track though as you could use your firewall to throttle these messages. If you prefer to use your custom script, I would recommend that you try it in a test environment first, before using it in production. Thanks!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked