Page 1 of 1

NagiosXI performance issues

Posted: Thu Nov 24, 2016 8:44 am
by Fred Kroeger
Running NagiosXI 5.2.9 for a while now with no performance issues.
We've added another NagiosXI Server to the site and am sending the events to the master server via Outbound Transfers (NRDP) and the Inbound transfers is working OK on the Master server.
As we added servers to the remote Nagios server, the Master Server Load started to increase to the point where Load average is +20 (compared to 3-5 beforehand). To date there are ~100 passive hosts & 600 passive services added.
When I look at the "Scheduled events over time" graph (Monitoring Engine Status) the column for Now seems to be permanently capped at just over 500.
It's almost like Nagios has reached a limit and won't process any more events more than 500.
I've gone through the performance tuning docs some time ago and am pretty happy that it was running at optimum before we started the Inbound transfers.
Currently monitoring 950 Active Hosts & 8,500 Active Services ( half of these are run via gearmand on another server)

Looking at top it appears that the mysqld process is consistently consuming the most resources.

Code: Select all

Tasks: 262 total,   8 running, 254 sleeping,   0 stopped,   0 zombie
Cpu(s): 90.1%us,  9.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:   3947696k total,  3064476k used,   883220k free,    70680k buffers
Swap:  2359288k total,    70032k used,  2289256k free,  1462800k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2275 mysql     20   0 1697m  66m 3772 S 47.2  1.7  49:15.73 mysqld
33949 apache    20   0  456m  40m 5188 R 20.9  1.0   0:12.67 httpd
30074 apache    20   0  456m  40m 5324 S 13.0  1.0   0:16.34 httpd
15605 apache    20   0  456m  40m 5348 S 12.3  1.0   0:28.20 httpd
24063 nagios    20   0 56960 8116 1068 S 12.3  0.2   2:55.78 ndo2db
37604 apache    20   0  454m  37m 4888 S 11.3  1.0   0:07.91 httpd
13734 apache    20   0  459m  43m 5232 S 10.6  1.1   0:31.50 httpd
26248 apache    20   0  458m  41m 5332 S 10.3  1.1   0:21.26 httpd
34095 apache    20   0  455m  39m 5192 S 10.3  1.0   0:10.93 httpd
44067 apache    20   0  445m  29m 4468 S 10.3  0.8   0:00.94 httpd
53473 apache    20   0  458m  42m 5472 S 10.3  1.1   0:54.72 httpd
27918 apache    20   0  455m  40m 5200 S 10.0  1.0   0:16.61 httpd
30071 apache    20   0  446m  31m 5212 S 10.0  0.8   0:14.44 httpd
31553 apache    20   0  459m  43m 5528 S 10.0  1.1   1:19.72 httpd
21243 apache    20   0  457m  42m 5544 S  9.6  1.1   1:31.44 httpd
29962 apache    20   0  447m  31m 5112 S  9.6  0.8   0:17.70 httpd
30072 apache    20   0  456m  40m 5184 S  9.0  1.0   0:14.78 httpd
35068 apache    20   0  456m  39m 5116 R  8.0  1.0   0:12.08 httpd
26000 apache    20   0  456m  39m 5208 R  7.3  1.0   0:17.28 httpd
22135 apache    20   0  457m  41m 5228 S  5.6  1.1   0:22.90 httpd
43649 apache    20   0  456m  39m 4556 S  5.6  1.0   0:01.60 httpd
27363 apache    20   0  457m  41m 5232 S  5.3  1.1   0:22.19 httpd
35319 apache    20   0  453m  37m 4872 S  5.3  1.0   0:10.90 httpd
30024 apache    20   0  457m  41m 5216 S  5.0  1.1   0:17.96 httpd
23955 gearmand  20   0  533m 4712 1160 S  3.7  0.1   0:46.43 gearmand
56063 apache    20   0  457m  42m 5484 R  2.7  1.1   1:58.71 httpd
44066 apache    20   0  448m  31m 4488 R  2.3  0.8   0:01.18 httpd
regards... Fred

Re: NagiosXI performance issues

Posted: Fri Nov 25, 2016 7:08 pm
by Fred Kroeger
I've been able to reduce some of the load by pushing more checks to a Mod-Gearman server - however it is still too high and slows down noticably (GUI responseiveness)
ps top cpu output below

Code: Select all

# ps -eo pid,comm,%cpu,pcpu,user,nice,cpu,pid,args | sort -rk 3|head -n 30
  PID COMMAND         %CPU %CPU USER      NI CPU   PID COMMAND
 7224 httpd            9.2  9.2 apache     0   -  7224 /usr/sbin/httpd
 4179 httpd            8.5  8.5 apache     0   -  4179 /usr/sbin/httpd
60953 httpd            8.3  8.3 apache     0   - 60953 /usr/sbin/httpd
20039 httpd            8.1  8.1 apache     0   - 20039 /usr/sbin/httpd
42305 httpd            8.0  8.0 apache     0   - 42305 /usr/sbin/httpd
25400 httpd            8.0  8.0 apache     0   - 25400 /usr/sbin/httpd
60950 httpd            7.9  7.9 apache     0   - 60950 /usr/sbin/httpd
35191 httpd            7.9  7.9 apache     0   - 35191 /usr/sbin/httpd
60996 httpd            7.8  7.8 apache     0   - 60996 /usr/sbin/httpd
 3959 httpd            7.8  7.8 apache     0   -  3959 /usr/sbin/httpd
15887 httpd            7.8  7.8 apache     0   - 15887 /usr/sbin/httpd
 4177 httpd            7.7  7.7 apache     0   -  4177 /usr/sbin/httpd
55889 httpd            7.5  7.5 apache     0   - 55889 /usr/sbin/httpd
26816 httpd            7.4  7.4 apache     0   - 26816 /usr/sbin/httpd
 2566 httpd            7.2  7.2 apache     0   -  2566 /usr/sbin/httpd
 4813 ndo2db           7.1  7.1 nagios     0   -  4813 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
64430 httpd            6.9  6.9 apache     0   - 64430 /usr/sbin/httpd
 4102 httpd            6.9  6.9 apache     0   -  4102 /usr/sbin/httpd
62322 httpd            6.7  6.7 apache     0   - 62322 /usr/sbin/httpd
 4178 httpd            6.5  6.5 apache     0   -  4178 /usr/sbin/httpd
 4176 httpd            6.0  6.0 apache     0   -  4176 /usr/sbin/httpd
10861 php              5.0  5.0 nagios     0   - 10861 /usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_event.php --handler-type=host --host=rhev02.ops --hostaddress=X.X.X.X --hoststate=DOWN --hoststateid=1 --lasthoststate=UP --lasthoststateid=0 --hoststatetype=SOFT --currentattempt=1 --maxattempts=5 --hosteventid=1823164 --hostproblemid=811987 --hostoutput=CRITICAL - X.X.X.X: Time to live exceeded in transit @ X.X.X.X. rta nan, lost 100% --longhostoutput= --hostdowntime=1
10594 check_snmp_stor  4.5  4.5 nagios     0   - 10594 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_storage_wizard.pl -H 1X.X.X.X -C X--v2c -m ^/var/log$ -w 80 -c 95 -t 10 -f$
51569 gearmand         4.2  4.2 gearmand   0   - 51569 /usr/sbin/gearmand -d
 2275 mysqld          33.6 33.6 mysql      0   -  2275 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
10834 check_nac.py     3.0  3.0 nagios     0   - 10834 /usr/bin/python -tt /usr/local/nagios/local/check_nac.py -H X.X.X.X -t nodes -u X-p X
 4805 nagios           2.9  2.9 nagios     0   -  4805 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
 9821 httpd           17.8 17.8 apache     0   -  9821 /usr/sbin/httpd
10836 check_brocade.p  1.0  1.0 nagios     0   - 10836 /usr/bin/perl -w /usr/local/nagios/libexec/check_brocade.pl -H X.X.X.X -c X-t traffic
The VM has 3xCPUs & 4GB RAM - I am also using a RAMDisk

Code: Select all

top - 08:05:31 up 1 day, 12:48,  2 users,  load average: 9.33, 8.85, 8.66
Tasks: 260 total,  11 running, 249 sleeping,   0 stopped,   0 zombie
Cpu(s): 50.8%us, 10.1%sy,  0.0%ni, 12.5%id, 25.4%wa,  0.3%hi,  0.8%si,  0.0%st
Mem:   3947696k total,  3229848k used,   717848k free,    62000k buffers
Swap:  2359288k total,   244280k used,  2115008k free,  1552148k cached

Re: NagiosXI performance issues

Posted: Sun Nov 27, 2016 9:34 pm
by Fred Kroeger
More Info:
It would appear that the load increase can be attributed to the extra https sessions when receiving the outbound data from the other Nagios servers.
We are receiving outbound data from 2 Nagios servers with a total of ~450 hosts + ~2500 Services
Is there anyway that this can be managed better?

regards... Fred

Re: NagiosXI performance issues

Posted: Mon Nov 28, 2016 10:29 am
by dwhitfield
5.3.0 added some MySQL tuning options, but if you've already tuned the database, that's probably no help. I don't see that listed in your data though. If you do decide to upgrade, the latest is 5.3.3...no reason to stick with 5.3.0 even though that's where the MySQL stuff was added.

We are working on an overhaul that will really increase performance. If I had to guess, that's a 6.0 feature, not a 5.4.0 feature, but it's just a guess. I don't have an ETA on either of those at the moment.

I can tell 20k services is the low point for our cutoff on the amount of services, so you're well below that. That said, the amount of services your server can handle is going to depend on a variety of factors.

You may have already looked at https://assets.nagios.com/downloads/nag ... ios-XI.pdf, but that's the place to start if you don't want to upgrade.

Lastly, can you PM me your profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner. Thanks!

UPDATE: Profile received and shared with techs.

Re: NagiosXI performance issues

Posted: Tue Nov 29, 2016 9:23 pm
by Fred Kroeger
Thanks have sent the profile.
For the benefit of this thread, Nagios VM has been upgraded to NagiosXI 5.3.3 and provisioned with 8GB RAM.
No real improvement in performance.
And yes I've previously done all the performance tuning that I could find documented.

Regards... Fred

Re: NagiosXI performance issues

Posted: Wed Nov 30, 2016 11:27 am
by dwhitfield
Fred Kroeger wrote: For the benefit of this thread, Nagios VM has been upgraded to NagiosXI 5.3.3 and provisioned with 8GB RAM.
That's on the low-end of the suggested RAM at https://assets.nagios.com/downloads/nag ... ements.pdf. Also, I notice you mentioned the VM has 3 CPUs, but we suggest 4 or more for the amount of load you have.

Both of those things said, considering the MySQL load, you should probably look into off-loading your MySQL: https://assets.nagios.com/downloads/nag ... Server.pdf

As with any change, make sure you have a backup before off-loading the MySQL server.

Please let us know if you have any additional questions.

Re: NagiosXI performance issues

Posted: Wed Dec 14, 2016 7:22 am
by bsivavani
Hi Fred Kroeger,

I have tried to conatct you via PM and you are not able to receive messages and Apologies for posting in here.

I have seen your post in below forum to integrate SNS with Nagios.

https://support.nagios.com/forum/viewto ... 9&p=139624

Kindly let us know how you achieve this functionality. Please PM me if you have information.

Appreciate your help on this regard.


Thanks,
Siva vani

Re: NagiosXI performance issues

Posted: Wed Dec 14, 2016 10:06 am
by dwhitfield
@bsivavani please open a new thread for new issues.

@Fred Kroeger, are we ready to lock this up?