System performance

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
deepavaidya
Posts: 80
Joined: Sun Oct 06, 2013 8:23 am

System performance

Post by deepavaidya »

Hi,

We are monitoring around 30 hosts and around 2700 services currently and we are yet to add 20 more hosts and around 2000 services more. We are using the latest version of nagios xi. We have 1 CPU with 4 cores. The average system load is around 23 which is very high. We are facing many issues like delay in mail notification, apply configuration page stalls out etc. Please can you suggest some ways to improve the performance and also please let us know the maximum number of hosts and services that can be best monitored for a system having 1 CPU with 4 cores so that we can suggest the same to the client.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: System performance

Post by slansing »

The processor certainly plays a roll in your performance gains and losses, but so does other hardware, what are the rest of the specs on the server, memory, storage type, etc? Is this a VM? If so, what other VM's are sharing that processor. I'd recommend duplicating your configurations to a test server, and integrating mod_gearman on that server to see what effect it will have:

http://assets.nagios.com/downloads/nagi ... ios_XI.pdf

In the very near future, Nagios XI 2014 will be released, which includes Nagios Core 4 and it's new worker processes which should help load by quite a bit. In addition to answering my above questions, what does TOP currently show? Are these spikes in load, or the average for a 24 hour period?

Code: Select all

TOP
deepavaidya
Posts: 80
Joined: Sun Oct 06, 2013 8:23 am

Re: System performance

Post by deepavaidya »

Please find the output below. It is not a VM. Now we have reduced the services to be monitored to nearly 2k and can see better performance.

Code: Select all

top - 17:01:11 up 68 days, 14 min, 3 users, load average: 6.87, 8.08, 7.90
Tasks: 348 total, 1 running, 347 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 3.6%sy, 0.0%ni, 81.2%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16294180k total, 15231852k used, 1062328k free, 699484k buffers
Swap: 8216568k total, 4500k used, 8212068k free, 7379384k cached

Please let us know the maximum number of services that can be best monitored by nagios xi having 1 CPU with 4 cores. We expected at-least 10k services but the performance is bad when it reached nearly 3k.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: System performance

Post by abrist »

deepavaidya wrote:We expected at-least 10k services but the performance is bad when it reached nearly 3k.
It all depends on the type of checks. Do you do any oracle, snmp, or vmware checks? Can you give us a loose breakdown of the type and quantity of checks you are running?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
deepavaidya
Posts: 80
Joined: Sun Oct 06, 2013 8:23 am

Re: System performance

Post by deepavaidya »

we do snmp polling. To check the state of the ports in swiches and routers etc.

Is offloading the Mysql database a feasible solution in this case ? Please can you give us some information on that and also give us any other better solution.

PFA the screenshot for nagios which shows the time taken for active checks and service checks
You do not have the required permissions to view the files attached to this post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: System performance

Post by abrist »

The execution time is not bad at all. The 9 second checks are most likely large snmp queries or slow responses from the remote. What are the processes with teh highest load?

Code: Select all

ps -aux --sort=%cpu;
Run top, press '1', and post the output.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
kgopiramesh

Re: System performance

Post by kgopiramesh »

Hi,

please find the below for the top command output and attached for the ps output.


top - 08:53:38 up 69 days, 16:07, 3 users, load average: 6.71, 12.12, 11.59
Tasks: 348 total, 1 running, 347 sleeping, 0 stopped, 0 zombie
Cpu0 : 11.6%us, 2.9%sy, 0.0%ni, 79.6%id, 5.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 11.9%us, 2.8%sy, 0.0%ni, 84.5%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 10.6%us, 2.9%sy, 0.0%ni, 86.3%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 10.2%us, 2.5%sy, 0.0%ni, 87.1%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16294180k total, 14776912k used, 1517268k free, 707748k buffers
Swap: 8216568k total, 6732k used, 8209836k free, 6514944k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

29574 root 20 0 15160 1448 948 R 1.4 0.0 0:00.03 top
23275 postgres 20 0 210m 5596 3644 S 0.7 0.0 0:00.04 postmaster



Please let us know offloading the db will help us or not?
You do not have the required permissions to view the files attached to this post.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: System performance

Post by slansing »

Looks like most of the usage is coming from ifoperstatus SNMP checks, I'd actually recommend using mod_gearman to balance that load out and take the pressure off your 1 CPU system. Offloading the DB probably won't help your resource usage that much, at least at this point.

http://assets.nagios.com/downloads/nagi ... ios_XI.pdf
kgopiramesh

Re: System performance

Post by kgopiramesh »

Thanks Slansing for your reply, mostly we are monitoring the switches or routers in our environment so the distributed monitoring may not be possible. need one clarification here, do we need to have nagios server installed where we are going to install mod_gearman worker process?

We are using nagios xi 2.9 version and is it feasible to install mod_gearman and worker processes.

our version details : Nagios XI 2012R2.9 Copyright © 2008-2014 Nagios Enterprises, LLC.

Do we need to assign the service checks to each worker processes or it will take automatically ?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: System performance

Post by slansing »

By default modgearman will give jobs to any worker that is contacting it, this includes local and remote workers. You do not need to install Nagios on each worker site, as the documentation notes you will need the plugins you expect those workers to run checks with, to be on those worker servers. The remote hosts, if they are expected to communicate with the plugins (such as NRPE) will also need to have the worker server's addresses noted in their configuration files.
Locked