We are using mod_gearman to handle about 37,000 service checks, and running into a problem where the service check latency is really high.
I have a gearmand server defined in /usr/local/etc/mod_gearman2/module.conf
Code: Select all
server=192.168.249.37:4730(8GB RAM, 16 Cores)
we tried offloading it to a separate server, but the performance is/was the same.
And we have multiple worker servers (3) pointed to that gearmand server.
(4GB RAM, 8 Cores)
The workers are configured with the following:
Code: Select all
max-worker=1500
max-jobs=1000
spawn-rate=100And with that - the gearmand server - is max'ing out at about 1000-1200 connections:
Code: Select all
# netstat -anp | grep 4730 | wc -l
1029****
Code: Select all
top - 21:33:04 up 1:12, 2 users, load average: 0.96, 0.63, 0.69
Tasks: 277 total, 2 running, 275 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.5%us, 4.0%sy, 0.0%ni, 93.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8058952k total, 219040k used, 7839912k free, 8684k buffers
Swap: 2064380k total, 0k used, 2064380k free, 36404k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3479 naemon 20 0 840m 17m 720 R 105.9 0.2 2:48.30 gearmand
Any ideas are very much appreciated. I tried uploading our profile, but its too big .. so I've attached the summary details in the .txt file attached.
Thanks!
- Ian