Gearman Bottleneck
Posted: Tue Jul 07, 2015 9:44 pm
Hi Nagios!
We are using mod_gearman to handle about 37,000 service checks, and running into a problem where the service check latency is really high.
I have a gearmand server defined in /usr/local/etc/mod_gearman2/module.conf
Configured to start with 10 threads,
(8GB RAM, 16 Cores)
we tried offloading it to a separate server, but the performance is/was the same.
And we have multiple worker servers (3) pointed to that gearmand server.
(4GB RAM, 8 Cores)
The workers are configured with the following:
Each worker instance is consuming about 300(-ish) connections, but when I have only one worker, it goes up to 1000 connections.
And with that - the gearmand server - is max'ing out at about 1000-1200 connections:
The gearadmin --status command on the gearmand server doesn't return anything after there are about 500 connections (it just stops responding) .. so its not too useful for us, but I can see that there's activity on the server from "top":
****
Am i correct in assuming that the gearmand server is the bottleneck? the workers can scale up their connections as needed, so for some reason - we can't get the gearmand to keep up with the mod_gearman broker service on XI. (or the mod_gearman broker is the bottleneck?)
Any ideas are very much appreciated. I tried uploading our profile, but its too big .. so I've attached the summary details in the .txt file attached.
Thanks!
- Ian
We are using mod_gearman to handle about 37,000 service checks, and running into a problem where the service check latency is really high.
I have a gearmand server defined in /usr/local/etc/mod_gearman2/module.conf
Code: Select all
server=192.168.249.37:4730(8GB RAM, 16 Cores)
we tried offloading it to a separate server, but the performance is/was the same.
And we have multiple worker servers (3) pointed to that gearmand server.
(4GB RAM, 8 Cores)
The workers are configured with the following:
Code: Select all
max-worker=1500
max-jobs=1000
spawn-rate=100And with that - the gearmand server - is max'ing out at about 1000-1200 connections:
Code: Select all
# netstat -anp | grep 4730 | wc -l
1029****
Code: Select all
top - 21:33:04 up 1:12, 2 users, load average: 0.96, 0.63, 0.69
Tasks: 277 total, 2 running, 275 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.5%us, 4.0%sy, 0.0%ni, 93.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8058952k total, 219040k used, 7839912k free, 8684k buffers
Swap: 2064380k total, 0k used, 2064380k free, 36404k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3479 naemon 20 0 840m 17m 720 R 105.9 0.2 2:48.30 gearmand
Any ideas are very much appreciated. I tried uploading our profile, but its too big .. so I've attached the summary details in the .txt file attached.
Thanks!
- Ian