NagiosXI performance issue

anoop · Post by **anoop** » Thu Oct 10, 2013 1:10 pm

Hi Team,
We are currently added 1385 hosts with 7800 services and out of that 3900 services are SNMP Related for Network Devices..

And our expected hosts and services are 4000 hosts and 40,000 services, where 15,000 services will be active and 25,000 checks will be passive..

Suggest us on above requirement..

How RamDisk and rrdcache will be useful on this Scenario..??

slansing · Post by **slansing** » Thu Oct 10, 2013 1:25 pm

I would recommend starting with a local installation of mod_gearman and then moving to remote workers when the need arises. Even a local installation can have a huge impact with increased performance. For a installation of that size you may need to look at 3 or more remote worker systems which can be used to divvy up the processing of checks.

anoop · Post by **anoop** » Thu Oct 10, 2013 1:49 pm

Hi Team,

Thank you very much for replying, we will try with local mod_gearman.

We have following queries...

1. Our NagiosXI Server's ""Monitoring Engine Process "" automatically got stopped,
2. Some time due to this process issue graphs are not generated, how do we regenerate missing graphs.
3. IOPS wait going high more than 25% some time.

Thanks in advanced.

slansing · Post by **slansing** » Fri Oct 11, 2013 10:24 am

It looks like your system may have some deeper issues that should be resolved before working on adding something less crucial at the moment "mod_gearman." Is the nagios process running?

Code: Select all

service nagios status

service ndo2db status

service crond status

anoop · Post by **anoop** » Fri Oct 11, 2013 11:23 am

HI Team,

Yes the nagios,ndo2b and crond services are running but sometimes if i check the status or restart the services it is showing sometimes "ndo2b lock" and "nagios lock" and after some time it is setting up properly... and sometimes graphs are not generatng for long time like 3 to 4 hours and again im restarting monitoring engine status and its coming ..

please let us know the resolution... thanks

sreinhardt · Post by **sreinhardt** » Fri Oct 11, 2013 1:22 pm

As we discussed earlier. The largest issue you are facing presently is mrtg in relation to your network snmp monitoring. There is a relatively easy route to take that slansing has mentioned on page two, which is to split the configuration so that it can run these checks separately, and not in a single go. Until you make these changes, there is nothing else we can do to help you, as the massive increase in load and performance issues are very much due to this.

anoop · Post by **anoop** » Sat Oct 12, 2013 11:08 am

HI Team,

As per ur suggestion, i splitted up mrtg.cfg file into 4 files and some what performance is fine, but yesterday i configured VMware devices using vmware monitoring wizard for 33 base machine and 430 guest machine with 2500 service checks.. And today in my XI Server, load is increased and some performance also down. As when i planning to apply configuration, its taking hours and hours..

And still my graphs are not generating and i dig into some of my log files and find "npcd.log" file showing some error as

NPCD: ERROR: Executed command exits with return code '7'
[10-12-2013 19:11:17] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1381557654.perfdata.service.

And also tried some optionsby fixing the nagios xi perms, but still error exist..

Suggest us with better solution..
Thanks in advance

slansing · Post by **slansing** » Mon Oct 14, 2013 10:14 am

You will want to open:

Code: Select all

/usr/local/nagios/etc/pnp/npcd.cfg

And edit the "load_threshold =" to something higher than your system load which is astronomical right now.

Then restart npcd:

Code: Select all

service npcd restart

its taking hours and hours..

This is not really possible due to timeout and memory limits in your php.ini file..

Have you checked into mod_gearman yet....? You need to get these performance issues resolved...before you continue to add on more objects and slow your system down again.

anoop · Post by **anoop** » Tue Oct 15, 2013 7:48 am

HI Team,

we started installing the mod_gearman in my Nagios XI server locally as we don't have resources at present and once we got resource, ill install another worker in remote Server.

I Installed like using the script and configured neb.conf and worker.conf, where i provided Nagios XI IP Address in Worker Installation.

I just left hosts=yes and services=yes like in the default file..

Is there anything else we need to configure in Mod_gearman apart from this steps..

Suggest us if anything require..

Thanks

2:

HI team,

we are planning to configure RAM disk..

so, we thought of using separate 1GB SAS Storage for the RAM Disk memory, as our status.dat and object.cache files consumed 20 MB of file size and it grows in future.. and also planning to provide ext4 file system instead of "tmpfs" filesystem..

How much impact it will take if i will use the configuration like the above. ??

Suggest us with a better solution..

Thanks in advance...

slansing · Post by **slansing** » Tue Oct 15, 2013 10:31 am

How much ram do you have available on the XI system? It looks like you looked at your .dat files already, but also keep in mind that you will need extra room for things like performance data. Depending on how much comes through you may need a larger ramdisk configuration. As far as mod_gearman goes that looks good. Just be sure you followed the steps in the documentation and made the necessary changes that are posted there.

Nagios Support Forum

NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue

Re: NagiosXI performance issue