Nagios invoked oom-killer

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
estebanmonge
Posts: 50
Joined: Mon Feb 06, 2012 11:13 pm

Nagios invoked oom-killer

Post by estebanmonge »

Hello I have problems with Nagios Core, problems of performance.

Process NSCA explode and make a lot of processes and empty the memory, I can't make SSH or ping to host, either failing to access other services.

In logs receive next message:

Code: Select all

Jan 28 03:49:32 opencontrolit kernel: [136330.758415] nagios invoked oom-killer: gfp_mask=0x4d0, order=2, oom_adj=0, oom_score_adj=0
Jan 28 03:49:32 opencontrolit kernel: [136330.758420] nagios cpuset=/ mems_allowed=0
Jan 28 03:49:32 opencontrolit kernel: [136330.758424] Pid: 13035, comm: nagios Not tainted 3.2.0-0.bpo.4-686-pae #1 Debian 3.2.35-2~bpo60+1
Jan 28 03:49:32 opencontrolit kernel: [136330.758426] Call Trace:
Jan 28 03:49:32 opencontrolit kernel: [136330.758434]  [<c109fc3c>] ? dump_header+0x5c/0x142
Jan 28 03:49:32 opencontrolit kernel: [136330.758437]  [<c109ff2c>] ? oom_kill_process+0x34/0x223
Jan 28 03:49:32 opencontrolit kernel: [136330.758440]  [<c10a02b5>] ? T.670+0xad/0x101
Jan 28 03:49:32 opencontrolit kernel: [136330.758442]  [<c10a0497>] ? out_of_memory+0xed/0x15c
Jan 28 03:49:32 opencontrolit kernel: [136330.758445]  [<c10a2e32>] ? __alloc_pages_nodemask+0x4e3/0x61b
Jan 28 03:49:32 opencontrolit kernel: [136330.758451]  [<c10cb176>] ? ____cache_alloc+0x25d/0x406
Jan 28 03:49:32 opencontrolit kernel: [136330.758454]  [<c10cbc7a>] ? __kmalloc_track_caller+0x6f/0xb1
Jan 28 03:49:32 opencontrolit kernel: [136330.758459]  [<c1227b3c>] ? sock_alloc_send_pskb+0xd0/0x27d
Jan 28 03:49:32 opencontrolit kernel: [136330.758463]  [<c122b0ad>] ? __alloc_skb+0x4b/0xe8
Jan 28 03:49:32 opencontrolit kernel: [136330.758465]  [<c1227b3c>] ? sock_alloc_send_pskb+0xd0/0x27d
Jan 28 03:49:32 opencontrolit kernel: [136330.758472]  [<c102f559>] ? __wake_up_sync_key+0x36/0x4b
Jan 28 03:49:32 opencontrolit kernel: [136330.758476]  [<c11736e9>] ? _copy_from_user+0x2b/0x10e
Jan 28 03:49:32 opencontrolit kernel: [136330.758479]  [<c1227cf5>] ? sock_alloc_send_skb+0xc/0xf
Jan 28 03:49:32 opencontrolit kernel: [136330.758483]  [<c1291744>] ? unix_stream_sendmsg+0x11c/0x2d9
Jan 28 03:49:32 opencontrolit kernel: [136330.758488]  [<c1224500>] ? __sock_sendmsg_nosec+0x3e/0x45
Jan 28 03:49:32 opencontrolit kernel: [136330.758491]  [<c1224909>] ? sock_aio_write+0xbf/0xc8
Jan 28 03:49:32 opencontrolit kernel: [136330.758495]  [<c10d6ef3>] ? do_sync_write+0xa9/0xde
Jan 28 03:49:32 opencontrolit kernel: [136330.758499]  [<c1059176>] ? ktime_get_ts+0x76/0x7d
Jan 28 03:49:32 opencontrolit kernel: [136330.758502]  [<c10d70cd>] ? rw_verify_area+0xc7/0xe8
Jan 28 03:49:32 opencontrolit kernel: [136330.758505]  [<c10e3650>] ? poll_select_copy_remaining+0xbb/0xd7
Jan 28 03:49:32 opencontrolit kernel: [136330.758508]  [<c10d783c>] ? vfs_write+0x8b/0xd9
Jan 28 03:49:32 opencontrolit kernel: [136330.758511]  [<c10d7920>] ? sys_write+0x3c/0x63
Jan 28 03:49:32 opencontrolit kernel: [136330.758516]  [<c12db29f>] ? sysenter_do_call+0x12/0x28
Jan 28 03:49:32 opencontrolit kernel: [136330.758518] Mem-Info:
Jan 28 03:49:32 opencontrolit kernel: [136330.758520] DMA per-cpu:
Jan 28 03:49:32 opencontrolit kernel: [136330.758522] CPU    0: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758543] CPU    1: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758544] CPU    2: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758546] CPU    3: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758548] CPU    4: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758550] CPU    5: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758552] CPU    6: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758554] CPU    7: hi:    0, btch:   1 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758555] Normal per-cpu:
Jan 28 03:49:32 opencontrolit kernel: [136330.758557] CPU    0: hi:  186, btch:  31 usd:  27
Jan 28 03:49:32 opencontrolit kernel: [136330.758559] CPU    1: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758561] CPU    2: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758563] CPU    3: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758564] CPU    4: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758568] CPU    6: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758569] CPU    7: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758571] HighMem per-cpu:
Jan 28 03:49:32 opencontrolit kernel: [136330.758572] CPU    0: hi:  186, btch:  31 usd:  27
Jan 28 03:49:32 opencontrolit kernel: [136330.758574] CPU    1: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758576] CPU    2: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758578] CPU    3: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758580] CPU    4: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758582] CPU    5: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758584] CPU    6: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758585] CPU    7: hi:  186, btch:  31 usd:   0
Jan 28 03:49:32 opencontrolit kernel: [136330.758590] active_anon:280302 inactive_anon:40208 isolated_anon:0
Jan 28 03:49:32 opencontrolit kernel: [136330.758591]  active_file:163511 inactive_file:12411 isolated_file:0
Jan 28 03:49:32 opencontrolit kernel: [136330.758592]  unevictable:16005 dirty:0 writeback:0 unstable:0
Jan 28 03:49:32 opencontrolit kernel: [136330.758593]  free:792801 slab_reclaimable:10751 slab_unreclaimable:62202
Jan 28 03:49:32 opencontrolit kernel: [136330.758594]  mapped:5973 shmem:9488 pagetables:64747 bounce:0
Jan 28 03:49:32 opencontrolit kernel: [136330.758600] DMA free:4208kB min:784kB low:980kB high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:148kB slab_unreclaimable:4076kB kernel_stack:2464kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 28 03:49:32 opencontrolit kernel: [136330.758605] lowmem_reserve[]: 0 867 6071 6071
Jan 28 03:49:32 opencontrolit kernel: [136330.758613] Normal free:98060kB min:44116kB low:55144kB high:66172kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:42856kB slab_unreclaimable:244732kB kernel_stack:126752kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4802 all_unreclaimable? yes
Jan 28 03:49:32 opencontrolit kernel: [136330.758617] lowmem_reserve[]: 0 0 41638 41638
Jan 28 03:49:32 opencontrolit kernel: [136330.758625] HighMem free:3068936kB min:512kB low:66712kB high:132912kB active_anon:1121208kB inactive_anon:160832kB active_file:654044kB inactive_file:49436kB unevictable:64020kB isolated(anon):0kB isolated(file):0kB present:5329748kB mlocked:64020kB dirty:0kB writeback:0kB mapped:23888kB shmem:37952kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:258988kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 28 03:49:32 opencontrolit kernel: [136330.758630] lowmem_reserve[]: 0 0 0 0
Jan 28 03:49:32 opencontrolit kernel: [136330.758633] DMA: 48*4kB 2*8kB 0*16kB 1*32kB 2*64kB 6*128kB 4*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4208kB
Jan 28 03:49:32 opencontrolit kernel: [136330.758640] Normal: 18172*4kB 2666*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 0*4096kB = 98112kB
Jan 28 03:49:32 opencontrolit kernel: [136330.758646] HighMem: 288*4kB 19265*8kB 19800*16kB 7864*32kB 1494*64kB 212*128kB 99*256kB 47*512kB 12*1024kB 5*2048kB 525*4096kB = 3068808kB
Jan 28 03:49:32 opencontrolit kernel: [136330.758674] 185372 total pagecache pages
Jan 28 03:49:32 opencontrolit kernel: [136330.758675] 0 pages in swap cache
Jan 28 03:49:32 opencontrolit kernel: [136330.758677] Swap cache stats: add 0, delete 0, find 0/0
Jan 28 03:49:32 opencontrolit kernel: [136330.758679] Free swap  = 1951740kB
Jan 28 03:49:32 opencontrolit kernel: [136330.758680] Total swap = 1951740kB
Jan 28 03:49:32 opencontrolit kernel: [136330.795684] 1834992 pages RAM
Jan 28 03:49:32 opencontrolit kernel: [136330.795686] 1607170 pages HighMem
Jan 28 03:49:32 opencontrolit kernel: [136330.795688] 278222 pages reserved
Jan 28 03:49:32 opencontrolit kernel: [136330.795689] 2383375 pages shared
Jan 28 03:49:32 opencontrolit kernel: [136330.795690] 573232 pages non-shared
Jan 28 03:49:32 opencontrolit kernel: [136330.795720] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Jan 28 03:49:32 opencontrolit kernel: [136330.795731] [  345]     0   345      607      207   0     -17         -1000 udevd
Jan 28 03:49:32 opencontrolit kernel: [136330.795734] [  432]     0   432      577      147   5     -17         -1000 udevd
Jan 28 03:49:32 opencontrolit kernel: [136330.795737] [  433]     0   433      577      107   6     -17         -1000 udevd
Jan 28 03:49:32 opencontrolit kernel: [136330.795741] [  911]     0   911      482      144   2       0             0 vnstatd
Jan 28 03:49:32 opencontrolit kernel: [136330.795744] [  917]     0   917     7828     1080   6       0             0 rsyslogd
Jan 28 03:49:32 opencontrolit kernel: [136330.795747] [  955]     0   955      439      153   3       0             0 acpid
Jan 28 03:49:32 opencontrolit kernel: [136330.795750] [  991]     0   991     5845      700   0       0             0 vmtoolsd
Jan 28 03:49:32 opencontrolit kernel: [136330.795754] [ 1038]     0  1038      966      248   5       0             0 cron
Jan 28 03:49:32 opencontrolit kernel: [136330.795757] [ 1049]   105  1049      658       97   5       0             0 dbus-daemon
Jan 28 03:49:32 opencontrolit kernel: [136330.795760] [ 1253]     0  1253     1387      246   3     -17         -1000 sshd
Jan 28 03:49:32 opencontrolit kernel: [136330.795764] [ 1657]     0  1657      440      139   2       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795767] [ 1658]     0  1658      440      140   6       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795770] [ 1659]     0  1659      440      139   4       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795772] [ 1660]     0  1660      440      141   7       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795776] [ 1661]     0  1661      440      139   7       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795779] [ 2156]     0  2156      440      141   6       0             0 getty
Jan 28 03:49:32 opencontrolit kernel: [136330.795783] [31280]  1001 31280      539      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795787] [31311]  1001 31311      539      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795790] [12253]     0 12253    23567     3524   7       0             0 apache2
Jan 28 03:49:32 opencontrolit kernel: [136330.795794] [12356]     0 12356      450      137   5       0             0 mysqld_safe
Jan 28 03:49:32 opencontrolit kernel: [136330.795797] [12515]   101 12515    39145     5375   2       0             0 mysqld
Jan 28 03:49:32 opencontrolit kernel: [136330.795800] [12516]     0 12516      804      149   2       0             0 logger
Jan 28 03:49:32 opencontrolit kernel: [136330.795803] [12949]  1001 12949     9251      248   7       0             0 npcd
Jan 28 03:49:32 opencontrolit kernel: [136330.795806] [12997]  1001 12997      539      180   0       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795809] [13028]  1001 13028     9570     2531   3       0             0 nagios
Jan 28 03:49:32 opencontrolit kernel: [136330.795815] [12761]     0 12761     1469      471   5       0             0 master
Jan 28 03:49:32 opencontrolit kernel: [136330.795818] [12768]   104 12768     1508      504   3       0             0 qmgr
Jan 28 03:49:32 opencontrolit kernel: [136330.795822] [ 1161]    33  1161     7944     1148   2       0             0 apache2
Jan 28 03:49:32 opencontrolit kernel: [136330.795825] [ 5615]  1001  5615     9571     2212   0       0             0 nagios
Jan 28 03:49:32 opencontrolit kernel: [136330.795828] [15509]  1001 15509      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795831] [15510]  1001 15510      540      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795834] [15511]  1001 15511      540      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795837] [15517]  1001 15517      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795840] [15518]  1001 15518      540      126   0       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795843] [15519]  1001 15519      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795845] [15520]  1001 15520      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795848] [15521]  1001 15521      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795851] [15522]  1001 15522      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795854] [15523]  1001 15523      540      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795857] [15524]  1001 15524      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795860] [15525]  1001 15525      540      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795863] [15530]  1001 15530      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795866] [15531]  1001 15531      540      126   4       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795869] [15532]  1001 15532      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795872] [15533]  1001 15533      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795884] [15534]  1001 15534      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795887] [15535]  1001 15535      540      126   4       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795890] [15536]  1001 15536      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795893] [15537]  1001 15537      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795896] [15538]  1001 15538      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795899] [15539]  1001 15539      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795902] [15540]  1001 15540      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795905] [15554]  1001 15554      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795908] [15555]  1001 15555      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795911] [15556]  1001 15556      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795914] [15581]  1001 15581      540      126   1       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795917] [15582]  1001 15582      540      126   4       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795920] [15583]  1001 15583      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795923] [15596]  1001 15596      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795926] [15597]  1001 15597      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795929] [15599]  1001 15599      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795932] [15600]  1001 15600      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795935] [15601]  1001 15601      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795938] [15602]  1001 15602      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795941] [15609]  1001 15609      540      126   4       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795944] [15613]  1001 15613      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795946] [15614]  1001 15614      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795949] [15615]  1001 15615      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795952] [15616]  1001 15616      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795955] [15637]  1001 15637      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795958] [15647]  1001 15647      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795961] [15652]  1001 15652      540      126   0       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795964] [15653]  1001 15653      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795967] [15654]  1001 15654      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795969] [15655]  1001 15655      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795972] [15656]  1001 15656      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795975] [15657]  1001 15657      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795978] [15659]  1001 15659      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795981] [15660]  1001 15660      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795984] [15661]  1001 15661      540      126   0       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795987] [15662]  1001 15662      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795989] [15667]  1001 15667      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795992] [15668]  1001 15668      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795995] [15669]  1001 15669      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.795998] [15670]  1001 15670      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796001] [15671]  1001 15671      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796004] [15672]  1001 15672      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796006] [15673]  1001 15673      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796009] [15678]  1001 15678      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796012] [15683]  1001 15683      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796015] [15714]  1001 15714      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796018] [15725]  1001 15725      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796028] [15726]  1001 15726      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796031] [15731]  1001 15731      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796034] [15732]  1001 15732      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796037] [15733]  1001 15733      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796040] [15734]  1001 15734      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796043] [15735]  1001 15735      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796046] [15740]  1001 15740      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796049] [15746]  1001 15746      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796052] [15763]  1001 15763      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796054] [15774]  1001 15774      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796058] [15780]  1001 15780      540      126   5       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796060] [15788]  1001 15788      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796063] [15789]  1001 15789      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796066] [15790]  1001 15790      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796069] [15791]  1001 15791      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796072] [15792]  1001 15792      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796075] [15793]  1001 15793      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796078] [15798]  1001 15798      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796080] [15817]  1001 15817      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796083] [15842]  1001 15842      540      126   6       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796086] [15846]  1001 15846      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796089] [15847]  1001 15847      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796092] [15852]  1001 15852      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796095] [15855]  1001 15855      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796098] [15856]  1001 15856      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796101] [15857]  1001 15857      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796104] [15858]  1001 15858      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796107] [15863]  1001 15863      540      126   3       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796110] [15864]  1001 15864      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796113] [15865]  1001 15865      540      126   7       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796115] [15868]  1001 15868      540      126   2       0             0 nsca
Jan 28 03:49:32 opencontrolit kernel: [136330.796118] [15869]  1001 15869      540      126   6       0             0 nsca
BLABLABLA
 126   4         126   6       126   5         126   7          126   1       0        126   4          126   5       0        126   6       0     126   1            126   5            126   4            126   5       0             126   6       0       126   7        126   6       0       126   5          126   1       0      126   7          126   7       0            126   6        126   5       0       126   6       0     126   5       0     126   6       0      126   1         126   7        126   4      126   5       0      126   7         126   0       0          126   6            126   1       0     126   2            126   0      126   7        126   6       126   0           126   6           126   0       126   2      126   6       0       126   0         126   1        126   0          126   6       0        126   0       0           126
Jan 28 03:49:32 opencontrolit kernel: 7         126   0         126   7           126   2       0         126   5       0         126   0       0        126   6       0              126   1       0      126   2     126   7        126   5       0      126   4           126   2         126   4       126   1       126   6            126   1         126   7       0     126   5       126   4        126   5           126   7        126   6           126   6       0       126   6       0      126   1            126   7       0          126   5       126   7           126   5          126   6       0      126   1       0     126   4       0        126   5       0     126   7       0       126   2       0        126   3            126   4       0        126   5       126   5            126   7       0         126   2         126   5       0            126   4      126   0            126   5        126   6       126   2      126   7       126   1       126   3         126   4       126   6       0      126   2         126   4
Jan 28 03:49:32 opencontrolit kernel: 0     126   3       0              126   5      126   6           126   7         126   1       126   4      126   2          126   3         126   5            126   5         126   3           126   5       0        126   1       0       126   5          126   2       126   5       0         126   7       0     126   1            126   5       0      126   6       126   0          126   0       126   7          126   3          126   5            126   2       0     126   0           126   6       0     126   1       0          126   7      126   2            126   2          126   0       0        126   5          126   7           126   5        126   2            126   1         126   0         126   2       0           126   6        126   1       0       126   7       0       126   5       0          126   1         126   5       0              126   3       0     126   2           126   2        126   5         126   3       126   6       0        126   0          126   5
BLABLABLA

What is wrong?
Last edited by mguthrie on Thu Jan 31, 2013 10:43 am, edited 1 time in total.
Reason: Put log data into code tags
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios invoked oom-killer

Post by slansing »

Can you let us know what version of Core, Plugins, and NSCA you have installed? OOM Killer is a common linux process which works to kill other processes in order to free up memory. How many passive checks are you hitting the Nagios server with?
estebanmonge
Posts: 50
Joined: Mon Feb 06, 2012 11:13 pm

Re: Nagios invoked oom-killer

Post by estebanmonge »

Hello, I have:
Nagios 3.3.1
NSCA 2.9.1
Nagios Plugins 1.4.15

Program-Wide Performance Information

Code: Select all

Services Actively Checked:
	
Time Frame	Services Checked
<= 1 minute:	48 (14.6%)
<= 5 minutes:	317 (96.4%)
<= 15 minutes:	329 (100.0%)
<= 1 hour:	329 (100.0%)
Since program start:  	329 (100.0%)
	
Metric	Min.	Max.	Average
Check Execution Time:  	0.01 sec	18.11 sec	0.544 sec
Check Latency:	0.00 sec	0.28 sec	0.131 sec
Percent State Change:	0.00%	20.00%	0.11%
Services Passively Checked:
	
Time Frame	Services Checked
<= 1 minute:	127 (19.7%)
<= 5 minutes:	619 (95.8%)
<= 15 minutes:	646 (100.0%)
<= 1 hour:	646 (100.0%)
Since program start:  	646 (100.0%)
	
Metric	Min.	Max.	Average
Percent State Change:  	0.00%	10.66%	0.03%
Hosts Actively Checked:
	
Time Frame	Hosts Checked
<= 1 minute:	80 (63.5%)
<= 5 minutes:	94 (74.6%)
<= 15 minutes:	94 (74.6%)
<= 1 hour:	94 (74.6%)
Since program start:  	94 (74.6%)
	
Metric	Min.	Max.	Average
Check Execution Time:  	0.00 sec	5.13 sec	2.992 sec
Check Latency:	0.00 sec	0.31 sec	0.125 sec
Percent State Change:	0.00%	7.89%	0.06%
Hosts Passively Checked:
	
Time Frame	Hosts Checked
<= 1 minute:	35 (33.7%)
<= 5 minutes:	101 (97.1%)
<= 15 minutes:	104 (100.0%)
<= 1 hour:	104 (100.0%)
Since program start:  	104 (100.0%)
	
Metric	Min.	Max.	Average
Percent State Change:  	0.00%	11.05%	0.11%
Check Statistics:
	
Type	Last 1 Min	Last 5 Min	Last 15 Min
Active Scheduled Host Checks	86	435	1300
Active On-Demand Host Checks	5	27	78
Parallel Host Checks	86	435	1300
Serial Host Checks	0	0	0
Cached Host Checks	5	27	78
Passive Host Checks	16	17	17
Active Scheduled Service Checks	61	337	1005
Active On-Demand Service Checks	0	0	0
Cached Service Checks	0	0	0
Passive Service Checks	60	62	62
External Commands	210	946	2779
Buffer Usage:
	
Type	In Use	Max Used	Total Available
External Commands 	0	22	4096
I use MKLivestatus, and make the query to consult all services that accept passive checks: 648
Hosts with passive checks: 105

Thanks
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios invoked oom-killer

Post by abrist »

NSCA has a number of issues at the moment.

You may want to downgrade NSCA to 2.7.2 by running the following

Code: Select all

    wget http://assets.nagios.com/downloads/nagiosxi/agents/nsca-downgrade.tar.gz
    tar xvf nsca-downgrade.tar.gz
    cd nsca
    ./upgrade
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
estebanmonge
Posts: 50
Joined: Mon Feb 06, 2012 11:13 pm

Re: Nagios invoked oom-killer

Post by estebanmonge »

For us the use of nsca 2.7.2 isn't option because we can't use not syncronized hour checks with central server host. Last time we need apply a user based patch for making possible accepting non synchronized checks. We upgraded to 2.9.2 based in changes for more stability and one official way to accept non synchronized checks.

Also with NSCA 2.7.2 we have the same problem, a lot of nsca children process.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios invoked oom-killer

Post by slansing »

Is this a patch that your team created? I would try to turn on your debug logging to see what NSCA is actually doing after those forks. How many passive checks do you have hitting your Nagios system per minute?
estebanmonge
Posts: 50
Joined: Mon Feb 06, 2012 11:13 pm

Re: Nagios invoked oom-killer

Post by estebanmonge »

I applied this modification to NSCA 2.7.2 for accepting future timestamp checks:
/* check the timestamp in the packet */
packet_time=(time_t)ntohl(receive_packet.timestamp);
time(&current_time);
//if(packet_time>current_time){
// syslog(LOG_ERR,"Dropping packet with future
// timestamp.");
// /*return;*/
// close(sock);
// if(mode==SINGLE_PROCESS_DAEMON)
// return;
// else
// do_exit(STATE_OK);
// }
//else{
packet_age= abs ( (unsigned long)(current_time-packet_time) );
syslog(LOG_ERR,"Time diff: %lu seconds", packet_age);
if(max_packet_age>0 && (packet_age>max_packet_age)){
syslog(LOG_ERR,"Dropping packet with stale timestamp -
packet was %lu seconds old or new.",packet_age);
/*return;*/
close(sock);
if(mode==SINGLE_PROCESS_DAEMON)
return;
else
do_exit(STATE_OK);
}
// }
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios invoked oom-killer

Post by slansing »

Do you have another core system where you can test the current supported version of NSCA and then your modified version? When did this issue start to occur? And how many passive checks are you sending into the box and at what interval?
estebanmonge
Posts: 50
Joined: Mon Feb 06, 2012 11:13 pm

Re: Nagios invoked oom-killer

Post by estebanmonge »

OK.

I have three system with 2.9.1 no modifications. In past installations I had modified 2.7.2 to accept not synchronized timestamp checks.

How can I check passive checks per second?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios invoked oom-killer

Post by slansing »

Just look at your services list and determine how many checks you have set up as passive, then measure from the last check time of the first one to the last one and see how many go through in one minute.
Locked