Re: [Nagios-devel] Nagios and Gearman - huge environment
Posted: Fri Aug 19, 2011 8:44 pm
--0016367fb2a54e233e04aae2a05d
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Thanks, Daniel, but I don=B4t think that my problem is of hardware. I creat=
e
the ramdisk and the problem is the same:
- nagios eating 100% of CPU all the time;
- nagios does not distribute the active checks in a smoothly way. It waits
a long time and make the acitve checks in a burst way. I can see this with
the gearman_top. The gearmand jobs waiting queue is empty almost all the
time, but sometimes there is a burst of jobs in the queue. I can=B4t
understand this behavior.
Any help would be great. Thanks everybody.
=3D=3D=3D=3D=3D=3D=3D=3D=3D
Top result
=3D=3D=3D=3D=3D=3D=3D=3D=3D
top - 18:40:59 up 106 days, 16:56, 4 users, load average: 8.52, 6.09, 5.4=
2
Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.5%us, 0.1%sy, 0.0%ni, 87.1%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4916356k total, 1974976k used, 2941380k free, 163240k buffers
Swap: 4194296k total, 22092k used, 4172204k free, 745100k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2189 nagios 25 0 492m 255m 1668 R 100.1 5.3 66:54.59 nagios
24658 nagios 15 0 561m 116m 676 S 0.7 2.4 62:00.96 gearmand
On Fri, Aug 19, 2011 at 1:31 PM, Daniel Wittenberg wrote:
> Well but look at your bi and bo, and then the wa column. So looks like
> you have some IO Wait which probably means it=92s waiting on disk activit=
y to
> get things done, and lots of writing to disk. Have you looked at adding =
a
> ramdisk for your checkresults, status.dat, and temp_file? That should he=
lp
> eliminate most of the heavy disk i/o from the nagios perspective. Since =
it
> doesn=92t look like you are swapping memory you should be able to throw s=
ome
> at a ramdisk. You can probably start with 64MB and watch it, might have =
to
> go higher depending on your workload.****
>
> ** **
>
> Dan****
>
> ** **
>
> *From:* Rodney Ramos [mailto:[email protected]]
> *Sent:* Friday, August 19, 2011 11:27 AM
> *To:* Nagios Developers List
> *Subject:* Re: [Nagios-devel] Nagios and Gearman - huge environment
> performance problem****
>
> ** **
>
> Hi, Daniel,
>
> As we can see below, I think it is not a hardware problem. The idle CPU i=
s
> beteween 60 and 80 %, very good.
>
> Thank you very much.
>
>
> $ vmstat 5
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy i=
d
> wa st
> 1 2 22092 3046788 189640 890940 0 0 295 1053 0 0 4 3 =
83
> 10 0
> 1 2 22092 3032992 189664 904600 0 0 2733 7550 3498 7477 12 1 =
69
> 18 0
> 1 2 22092 3018240 189668 918632 0 0 2720 4070 2484 5114 13 1 =
72
> 15 0
> 1 0 22092 3008312 189668 930336 0 0 2332 1534 1932 3825 13 1 =
73
> 14 0
> 1 18 22092 2979292 189724 945780 0 0 1486 13974 2460 8446 16 2 =
72
> 10 0
> 1 2 22092 2965244 189736 959228 0 0 2570 9094 3290 7204 13 1 =
67
> 19 0
> 1 2 22092 2949064 189748 973100 0 0 2820 3040 2798 6639 13 2 =
68
> 17 0
> 1 6 22092 2936060 189768 987788 0 0 2894 3620 2474 5443 13 1 =
70
> 16 0
> 1 1 22092 2923320 189780 999708 0 0 2377 2618 2285 4794 13 1 =
70
> 16 0
> 1 0 22092 2923428 189780 999964 0 0 0 4575 1732 2317 12 1
> 86 1 0
> 1 9 22092 2912192 189784 1005260 0 0 402 4544 1541 3889 14 1
> 82 3 0
> 1 7 22092 2891692 189808 1023020 0 0 2534 13969 3232 9421 14 2
> 66 17 0
> 3 2 22092 2868908 189836 1037064 0 0 2797 4115 3002 7055 30 2
> 54 14 0
> 2 2 22092 2860712 189860 1050376 0 0 2646 3352 2448 5416 16 1
> 67 17 0
> 1 8 22092 2847052 189872 1064036 0 0 2748 3970 2616 5487 13 1
> 69 17 0
> 1 0 22092 3469576 189876 462624 0 0 825 1245 1379 2098 12 1
> 83 5 0
> 1 0 22092 3469248 189884 462720 0 0 4 2631 1552 2599 13 0
> 86 0 0
> 1 2
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Thanks, Daniel, but I don=B4t think that my problem is of hardware. I creat=
e
the ramdisk and the problem is the same:
- nagios eating 100% of CPU all the time;
- nagios does not distribute the active checks in a smoothly way. It waits
a long time and make the acitve checks in a burst way. I can see this with
the gearman_top. The gearmand jobs waiting queue is empty almost all the
time, but sometimes there is a burst of jobs in the queue. I can=B4t
understand this behavior.
Any help would be great. Thanks everybody.
=3D=3D=3D=3D=3D=3D=3D=3D=3D
Top result
=3D=3D=3D=3D=3D=3D=3D=3D=3D
top - 18:40:59 up 106 days, 16:56, 4 users, load average: 8.52, 6.09, 5.4=
2
Tasks: 215 total, 2 running, 213 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.5%us, 0.1%sy, 0.0%ni, 87.1%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 4916356k total, 1974976k used, 2941380k free, 163240k buffers
Swap: 4194296k total, 22092k used, 4172204k free, 745100k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2189 nagios 25 0 492m 255m 1668 R 100.1 5.3 66:54.59 nagios
24658 nagios 15 0 561m 116m 676 S 0.7 2.4 62:00.96 gearmand
On Fri, Aug 19, 2011 at 1:31 PM, Daniel Wittenberg wrote:
> Well but look at your bi and bo, and then the wa column. So looks like
> you have some IO Wait which probably means it=92s waiting on disk activit=
y to
> get things done, and lots of writing to disk. Have you looked at adding =
a
> ramdisk for your checkresults, status.dat, and temp_file? That should he=
lp
> eliminate most of the heavy disk i/o from the nagios perspective. Since =
it
> doesn=92t look like you are swapping memory you should be able to throw s=
ome
> at a ramdisk. You can probably start with 64MB and watch it, might have =
to
> go higher depending on your workload.****
>
> ** **
>
> Dan****
>
> ** **
>
> *From:* Rodney Ramos [mailto:[email protected]]
> *Sent:* Friday, August 19, 2011 11:27 AM
> *To:* Nagios Developers List
> *Subject:* Re: [Nagios-devel] Nagios and Gearman - huge environment
> performance problem****
>
> ** **
>
> Hi, Daniel,
>
> As we can see below, I think it is not a hardware problem. The idle CPU i=
s
> beteween 60 and 80 %, very good.
>
> Thank you very much.
>
>
> $ vmstat 5
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy i=
d
> wa st
> 1 2 22092 3046788 189640 890940 0 0 295 1053 0 0 4 3 =
83
> 10 0
> 1 2 22092 3032992 189664 904600 0 0 2733 7550 3498 7477 12 1 =
69
> 18 0
> 1 2 22092 3018240 189668 918632 0 0 2720 4070 2484 5114 13 1 =
72
> 15 0
> 1 0 22092 3008312 189668 930336 0 0 2332 1534 1932 3825 13 1 =
73
> 14 0
> 1 18 22092 2979292 189724 945780 0 0 1486 13974 2460 8446 16 2 =
72
> 10 0
> 1 2 22092 2965244 189736 959228 0 0 2570 9094 3290 7204 13 1 =
67
> 19 0
> 1 2 22092 2949064 189748 973100 0 0 2820 3040 2798 6639 13 2 =
68
> 17 0
> 1 6 22092 2936060 189768 987788 0 0 2894 3620 2474 5443 13 1 =
70
> 16 0
> 1 1 22092 2923320 189780 999708 0 0 2377 2618 2285 4794 13 1 =
70
> 16 0
> 1 0 22092 2923428 189780 999964 0 0 0 4575 1732 2317 12 1
> 86 1 0
> 1 9 22092 2912192 189784 1005260 0 0 402 4544 1541 3889 14 1
> 82 3 0
> 1 7 22092 2891692 189808 1023020 0 0 2534 13969 3232 9421 14 2
> 66 17 0
> 3 2 22092 2868908 189836 1037064 0 0 2797 4115 3002 7055 30 2
> 54 14 0
> 2 2 22092 2860712 189860 1050376 0 0 2646 3352 2448 5416 16 1
> 67 17 0
> 1 8 22092 2847052 189872 1064036 0 0 2748 3970 2616 5487 13 1
> 69 17 0
> 1 0 22092 3469576 189876 462624 0 0 825 1245 1379 2098 12 1
> 83 5 0
> 1 0 22092 3469248 189884 462720 0 0 4 2631 1552 2599 13 0
> 86 0 0
> 1 2
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]