[LU-2046] SWL - OSS hits OOM killer Created: 28/Sep/12  Updated: 01/Oct/12  Resolved: 01/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Oleg Drokin
Resolution: Duplicate Votes: 0
Labels: None
Environment:

LLNL/Hyperon


Severity: 3
Rank (Obsolete): 4257

 Description   

Running miranda-io, OSS dies with oom-killer

Sep 28 07:26:07 hyperion-dit29 kernel: Lustre: 5836:0:(lustre_log.h:474:llog_group_set_export()) Skipped 13 previous similar messages
Sep 28 07:26:07 hyperion-dit29 kernel: Lustre: 5836:0:(llog_net.c:162:llog_receptor_accept()) changing the import ffff880254a4c800 - ffff88011c83e000
Sep 28 07:26:07 hyperion-dit29 kernel: Lustre: 5836:0:(llog_net.c:162:llog_receptor_accept()) Skipped 13 previous similar messages
Sep 28 07:30:09 hyperion-dit29 cfengine:hyperion-dit29[7360]: stat: No such file or directory
Sep 28 07:30:09 hyperion-dit29 cfengine:hyperion-dit29[7360]: stat: No such file or directory
Sep 28 07:30:09 hyperion-dit29 cfengine:hyperion-dit29[7360]: stat: No such file or directory
Sep 28 07:30:11 hyperion-dit29 syslog-ng[3472]: EOF occurred while idle; fd='10'
Sep 28 07:30:14 hyperion-dit29 cfengine:hyperion-dit29[7360]: stat: No such file or directory
Sep 28 07:30:14 hyperion-dit29 cfengine:hyperion-dit29[7360]: stat: No such file or directory
Sep 28 11:34:02 hyperion-dit29 LDAPOTP-AUTH[8025]: root@ehyperion576 as root: cmd='/usr/bin/pdcp -p -y -z /etc'
Sep 28 11:43:08 hyperion-dit29 kernel: ll_ost_io02_036 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=-17, oom_score_adj=0
Sep 28 11:43:08 hyperion-dit29 kernel: ll_ost_io02_036 cpuset=/ mems_allowed=1
Sep 28 11:43:08 hyperion-dit29 kernel: Pid: 6335, comm: ll_ost_io02_036 Tainted: P           ---------------    2.6.32-279.5.1.el6_lustre.x86_64 #1
Sep 28 11:43:08 hyperion-dit29 kernel: Call Trace:
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff810c4aa1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff81117210>] ? dump_header+0x90/0x1b0
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff810c58b1>] ? cpuset_mems_allowed_intersects+0x21/0x30
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff81117692>] ? oom_kill_process+0x82/0x2a0 
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff8111758e>] ? select_bad_process+0x9e/0x120
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff81117ad0>] ? out_of_memory+0x220/0x3c0
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff811277ee>] ? __alloc_pages_nodemask+0x89e/0x940
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff81114617>] ? __page_cache_alloc+0x87/0x90
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff8111541f>] ? find_or_create_page+0x4f/0xb0
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0e361b5>] ? filter_get_page+0x35/0x70 [obdfilter]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0e378a8>] ? filter_preprw_write+0x12b8/0x2340 [obdfilter]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa04d3edb>] ? lnet_ni_send+0x4b/0x110 [lnet]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0a1ce6b>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0a0a024>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0e39730>] ? filter_preprw+0x80/0xa0 [obdfilter]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0b0f81c>] ? obd_preprw+0x12c/0x3d0 [ost]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0b1698a>] ? ost_brw_write+0x87a/0x1600 [ost]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09d36de>] ? ptlrpc_send_reply+0x28e/0x860 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09db1dc>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09db338>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa0b1d02c>] ? ost_handle+0x360c/0x4850 [ost]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09db8fc>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09e26fb>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09eab3c>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa042665e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa043813f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09e1f37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09ec111>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffffa09eb520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 11:43:08 hyperion-dit29 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Sep 28 11:43:08 hyperion-dit29 kernel: Mem-Info:
Sep 28 11:43:08 hyperion-dit29 kernel: Mem-Info:
Sep 28 11:43:08 hyperion-dit29 kernel: Node 1 Normal per-cpu:
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    0: hi:  186, btch:  31 usd:   7
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    1: hi:  186, btch:  31 usd:  18
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    2: hi:  186, btch:  31 usd:   7
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    3: hi:  186, btch:  31 usd:  29
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    4: hi:  186, btch:  31 usd: 179
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    5: hi:  186, btch:  31 usd: 178
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    6: hi:  186, btch:  31 usd:  23
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    7: hi:  186, btch:  31 usd:  91
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    8: hi:  186, btch:  31 usd:  26
Sep 28 11:43:08 hyperion-dit29 kernel: CPU    9: hi:  186, btch:  31 usd:  11
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   10: hi:  186, btch:  31 usd:   7
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   11: hi:  186, btch:  31 usd:  30
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   13: hi:  186, btch:  31 usd:   2
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   14: hi:  186, btch:  31 usd:  23
Sep 28 11:43:08 hyperion-dit29 kernel: CPU   15: hi:  186, btch:  31 usd:  11
Sep 28 11:43:08 hyperion-dit29 kernel: active_anon:4402 inactive_anon:7450 isolated_anon:0
Sep 28 11:43:08 hyperion-dit29 kernel: active_file:17703 inactive_file:27090 isolated_file:64
Sep 28 11:43:08 hyperion-dit29 kernel: unevictable:0 dirty:69 writeback:0 unstable:0
Sep 28 11:43:08 hyperion-dit29 kernel: free:38023 slab_reclaimable:5697 slab_unreclaimable:5547440
Sep 28 11:43:08 hyperion-dit29 kernel: mapped:490 shmem:8175 pagetables:439 bounce:0
Sep 28 11:43:08 hyperion-dit29 kernel: Node 1 Normal free:52084kB min:45096kB low:56368kB high:67644kB active_anon:4372kB inactive_anon:4064kB active_file:52884kB inactive_file:86880kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:12410880kB mlocked:0kB dirty:272kB writeback:0kB mapped:1952kB shmem:1192kB slab_reclaimable:10456kB slab_unreclaimable:11372900kB kernel_stack:3024kB pagetables:912kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? no
Sep 28 11:43:08 hyperion-dit29 kernel: lowmem_reserve[]: 0 0 0 0
Sep 28 11:43:08 hyperion-dit29 kernel: Node 1 Normal: 2366*4kB 1236*8kB 443*16kB 200*32kB 103*64kB 37*128kB 10*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 52872kB
Sep 28 11:43:08 hyperion-dit29 kernel: 53304 total pagecache pages
Sep 28 11:43:08 hyperion-dit29 kernel: 0 pages in swap cache
Sep 28 11:43:08 hyperion-dit29 kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep 28 11:43:08 hyperion-dit29 kernel: Free swap  = 0kB
Sep 28 11:43:08 hyperion-dit29 kernel: Total swap = 0kB
Sep 28 11:43:08 hyperion-dit29 kernel: 6291440 pages RAM
Sep 28 11:43:08 hyperion-dit29 kernel: 174434 pages reserved
Sep 28 11:43:08 hyperion-dit29 kernel: 65273 pages shared
Sep 28 11:43:08 hyperion-dit29 kernel: 5987243 pages non-shared
ep 28 11:43:08 hyperion-dit29 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Sep 28 11:43:08 hyperion-dit29 kernel: [ 2308]     0  2308     2791      224   0     -17         -1000 udevd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3471]     0  3471     6600       51   0       0             0 syslog-ng
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3472]     0  3472    14198      414   6       0             0 syslog-ng
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3473]     0  3473     2821       47   4       0             0 shutdown-hot
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3521]     0  3521     2284      121   3       0             0 irqbalance
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3535]    32  3535     4739       75   6       0             0 rpcbind
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3553]    29  3553     5832      120   0       0             0 rpc.statd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3574]     0  3574     6842       70  10       0             0 rpc.idmapd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3666]    81  3666     5342      105  10       0             0 dbus-daemon
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3745]     0  3745    55130      228   0       0             0 munged
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3757]     0  3757    29122      639  14       0             0 snmpd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3768]     0  3768    16009      165  13     -17         -1000 sshd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3778]     0  3778     5519      131  14       0             0 xinetd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3797]    38  3797     6481      224   4       0             0 ntpd
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3849]     0  3849     5093      255  10       0             0 crond
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3858]     0  3858     3532      974  13       0             0 cerebrod
Sep 28 11:43:08 hyperion-dit29 kernel: [ 3906]     0  3906   352127      342  12       0             0 opensm
Sep 28 11:43:09 hyperion-dit29 kernel: [ 3936]     0  3936    12412       74   3       0             0 srp_daemon
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4185]     0  4185     1015       21   7       0             0 agetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4190]     0  4190     1012       20   9       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4192]     0  4192     1012       20  11       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4195]     0  4195     1012       20  13       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4198]     0  4198     1012       20  14       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4201]     0  4201     1012       20  14       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4203]     0  4203     1012       20  13       0             0 mingetty
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4244]     0  4244     2790      222   8     -17         -1000 udevd
Sep 28 11:43:09 hyperion-dit29 kernel: [ 4245]     0  4245     2790      222   1     -17         -1000 udevd
Sep 28 11:43:09 hyperion-dit29 kernel: Out of memory: Kill process 3471 (syslog-ng) score 1 or sacrifice child
Sep 28 11:43:09 hyperion-dit29 kernel: Killed process 3472, UID 0, (syslog-ng) total-vm:56792kB, anon-rss:888kB, file-rss:768kB


 Comments   
Comment by Cliff White (Inactive) [ 28/Sep/12 ]

Disabled read_cache on all OSTs, this happened one minute later:

Sep 28 12:27:14 hyperion-dit32 kernel: ll_ost_io01_009 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=-17, oom_score_adj=0
Sep 28 12:27:14 hyperion-dit32 kernel: ll_ost_io01_009 cpuset=/ mems_allowed=0
Sep 28 12:27:14 hyperion-dit32 kernel: Pid: 6158, comm: ll_ost_io01_009 Tainted: P           ---------------    2.6.32-279.5.1.el6_lustre.x86_64 #1
Sep 28 12:27:15 hyperion-dit32 kernel: Call Trace:
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff810c4aa1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff81117210>] ? dump_header+0x90/0x1b0
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff81117692>] ? oom_kill_process+0x82/0x2a0
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff8111758e>] ? select_bad_process+0x9e/0x120
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff81117ad0>] ? out_of_memory+0x220/0x3c0
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff811277ee>] ? __alloc_pages_nodemask+0x89e/0x940
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff81114617>] ? __page_cache_alloc+0x87/0x90
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff8111541f>] ? find_or_create_page+0x4f/0xb0
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0e2b1b5>] ? filter_get_page+0x35/0x70 [obdfilter]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0e2de1d>] ? filter_preprw_read+0x4ed/0xd80 [obdfilter]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0a19e6b>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0a07024>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0e2e708>] ? filter_preprw+0x58/0xa0 [obdfilter]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0b0c81c>] ? obd_preprw+0x12c/0x3d0 [ost]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0b1283f>] ? ost_brw_read+0xd0f/0x12c0 [ost]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09980e0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff81271d39>] ? cpumask_next_and+0x29/0x50
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09d81dc>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09d8338>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa0b19550>] ? ost_handle+0x2b30/0x4850 [ost]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09d88fc>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09df6fb>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09e7b3c>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa041765e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa042913f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09e9111>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
Sep 28 12:27:15 hyperion-dit32 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Sep 28 12:27:15 hyperion-dit32 kernel: Mem-Info:
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA per-cpu:
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    8: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    9: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   10: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   11: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   12: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   13: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   14: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   15: hi:    0, btch:   1 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA32 per-cpu:
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    1: hi:  186, btch:  31 usd:  30
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    4: hi:  186, btch:  31 usd:  35
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 Normal per-cpu:
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 Normal per-cpu:
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Sep 28 12:27:15 hyperion-dit32 kernel: active_anon:4356 inactive_anon:7516 isolated_anon:0
Sep 28 12:27:15 hyperion-dit32 kernel: active_file:9338 inactive_file:12794 isolated_file:927
Sep 28 12:27:15 hyperion-dit32 kernel: unevictable:0 dirty:62 writeback:0 unstable:0
Sep 28 12:27:15 hyperion-dit32 kernel: free:36077 slab_reclaimable:6747 slab_unreclaimable:5569792
Sep 28 12:27:15 hyperion-dit32 kernel: mapped:501 shmem:8182 pagetables:440 bounce:0
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA free:15632kB min:52kB low:64kB high:76kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15224kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 28 12:27:15 hyperion-dit32 kernel: lowmem_reserve[]: 0 2991 12081 12081
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA32 free:47324kB min:11132kB low:13912kB high:16696kB active_anon:0kB inactive_anon:4kB active_file:1532kB inactive_file:1432kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:3063520kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:1140kB slab_unreclaimable:2552260kB kernel_stack:16kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4635 all_unreclaimable? yes
Sep 28 12:27:15 hyperion-dit32 kernel: lowmem_reserve[]: 0 0 9090 9090
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 Normal free:33768kB min:33824kB low:42280kB high:50736kB active_anon:12996kB inactive_anon:26004kB active_file:16100kB inactive_file:15984kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:9308160kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:31476kB slab_reclaimable:9304kB slab_unreclaimable:8362348kB kernel_stack:6456kB pagetables:860kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:54121 all_unreclaimable? yes
Sep 28 12:27:15 hyperion-dit32 kernel: lowmem_reserve[]: 0 0 0 0
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA: 2*4kB 1*8kB 0*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15632kB
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA32: 460*4kB 360*8kB 278*16kB 182*32kB 130*64kB 60*128kB 32*256kB 5*512kB 0*1024kB 1*2048kB 1*4096kB = 47888kB
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 Normal: 6756*4kB 323*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 33768kB
ep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA: 2*4kB 1*8kB 0*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15632kB
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 DMA32: 460*4kB 360*8kB 278*16kB 182*32kB 130*64kB 60*128kB 32*256kB 5*512kB 0*1024kB 1*2048kB 1*4096kB = 47888kB
Sep 28 12:27:15 hyperion-dit32 kernel: Node 0 Normal: 6756*4kB 323*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 33768kB
Sep 28 12:27:15 hyperion-dit32 kernel: 28269 total pagecache pages
Sep 28 12:27:15 hyperion-dit32 kernel: 0 pages in swap cache
Sep 28 12:27:15 hyperion-dit32 kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep 28 12:27:15 hyperion-dit32 kernel: Free swap  = 0kB
Sep 28 12:27:15 hyperion-dit32 kernel: Total swap = 0kB
Sep 28 12:27:15 hyperion-dit32 kernel: 6291440 pages RAM
Sep 28 12:27:15 hyperion-dit32 kernel: 174434 pages reserved
Sep 28 12:27:15 hyperion-dit32 kernel: 23533 pages shared
Sep 28 12:27:15 hyperion-dit32 kernel: 6005264 pages non-shared
Comment by Cliff White (Inactive) [ 28/Sep/12 ]

on dit29, oom-killer was unable to clear memory, node rebooted.

2012-09-28 12:27:53 active_anon:2055 inactive_anon:6722 isolated_anon:0
2012-09-28 12:27:53  active_file:8506 inactive_file:13146 isolated_file:5600
2012-09-28 12:27:53  unevictable:0 dirty:0 writeback:0 unstable:0
2012-09-28 12:27:53  free:35276 slab_reclaimable:6978 slab_unreclaimable:5570526
2012-09-28 12:27:54  mapped:89 shmem:8182 pagetables:87 bounce:0
2012-09-28 12:27:54 Node 1 Normal free:44760kB min:45096kB low:56368kB high:67644kB active_anon:1212kB inactive_anon:880kB active_file:13772kB inactive_file:30116kB unevictable:0kB isolated(anon):0kB isolated(file):23680kB present:12410880kB mlocked:0kB dirty:0kB writeback:0kB mapped:352kB shmem:1248kB slab_reclaimable:18764kB slab_unreclaimable:11367632kB kernel_stack:4264kB pagetables:136kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:226279 all_unreclaimable? no
2012-09-28 12:27:54 lowmem_reserve[]: 0 0 0 0
2012-09-28 12:27:54 Node 1 Normal: 1369*4kB 985*8kB 629*16kB 292*32kB 82*64kB 8*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 45180kB
2012-09-28 12:27:54 35999 total pagecache pages
2012-09-28 12:27:54 0 pages in swap cache
2012-09-28 12:27:54 Swap cache stats: add 0, delete 0, find 0/0
2012-09-28 12:27:54 Free swap  = 0kB
2012-09-28 12:27:54 Total swap = 0kB
2012-09-28 12:27:54 6291440 pages RAM
2012-09-28 12:27:54 174434 pages reserved
2012-09-28 12:27:54 41489 pages shared
2012-09-28 12:27:54 6001480 pages non-shared
2012-09-28 12:27:54 [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
2012-09-28 12:27:54 [ 2254]     0  2254     2809      221   4     -17         -1000 udevd
2012-09-28 12:27:54 [ 3714]     0  3714    16009      165  13     -17         -1000 sshd
2012-09-28 12:27:54 [ 4189]     0  4189     2808      220   5     -17         -1000 udevd
2012-09-28 12:27:54 [ 4190]     0  4190     2808      220   5     -17         -1000 udevd
2012-09-28 12:27:54 Kernel panic - not syncing: Out of memory and no killable processes...
2012-09-28 12:27:54
2012-09-28 12:27:54 Pid: 6324, comm: ll_ost_io02_038 Tainted: P           ---------------    2.6.32-279.5.1.el6_lustre.x86_64 #1
2012-09-28 12:27:54 Call Trace:
2012-09-28 12:27:54  [<ffffffff814fd58a>] ? panic+0xa0/0x168
2012-09-28 12:27:54  [<ffffffff81117281>] ? dump_header+0x101/0x1b0 
2012-09-28 12:27:54  [<ffffffff81117c3f>] ? out_of_memory+0x38f/0x3c0 
2012-09-28 12:27:54  [<ffffffff811277ee>] ? __alloc_pages_nodemask+0x89e/0x940
2012-09-28 12:27:54  [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110
2012-09-28 12:27:54  [<ffffffff81114617>] ? __page_cache_alloc+0x87/0x90
2012-09-28 12:27:54  [<ffffffff8111541f>] ? find_or_create_page+0x4f/0xb0
2012-09-28 12:27:54  [<ffffffffa0e2b1b5>] ? filter_get_page+0x35/0x70 [obdfilter]
2012-09-28 12:27:54  [<ffffffffa0e2c8a8>] ? filter_preprw_write+0x12b8/0x2340 [obdfilter]
2012-09-28 12:27:54  [<ffffffffa04c4edb>] ? lnet_ni_send+0x4b/0x110 [lnet]
2012-09-28 12:27:54  [<ffffffffa0a19e6b>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa0a07024>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa0e2e730>] ? filter_preprw+0x80/0xa0 [obdfilter]
2012-09-28 12:27:54  [<ffffffffa0b0c81c>] ? obd_preprw+0x12c/0x3d0 [ost]
2012-09-28 12:27:54  [<ffffffffa0b1398a>] ? ost_brw_write+0x87a/0x1600 [ost]
2012-09-28 12:27:54  [<ffffffffa09d06de>] ? ptlrpc_send_reply+0x28e/0x860 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09d81dc>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09d8338>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa0b1a02c>] ? ost_handle+0x360c/0x4850 [ost]
2012-09-28 12:27:54  [<ffffffff81054ce5>] ? select_idle_sibling+0x95/0x150
2012-09-28 12:27:54  [<ffffffffa09d88fc>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09df6fb>] ? ptlrpc_update_export_timer+0x4b/0x470 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09e7b3c>] ? ptlrpc_server_handle_request+0x41c/0xe00 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa041765e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
2012-09-28 12:27:54  [<ffffffffa042913f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
2012-09-28 12:27:54  [<ffffffffa09def37>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffff810533f3>] ? __wake_up+0x53/0x70
2012-09-28 12:27:54  [<ffffffffa09e9111>] ? ptlrpc_main+0xbf1/0x19e0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
2012-09-28 12:27:54  [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffffa09e8520>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
2012-09-28 12:27:54  [<ffffffff8100c140>] ? child_rip+0x0/0x20
Comment by Cliff White (Inactive) [ 28/Sep/12 ]

slabtop from dit30, which hasn't yet oom'd

 Active / Total Objects (% used)    : 525716406 / 525794146 (100.0%)
 Active / Total Slabs (% used)      : 4779745 / 4779745 (100.0%)
 Active / Total Caches (% used)     : 120 / 243 (49.4%)
 Active / Total Size (% used)       : 16881340.17K / 16893819.62K (99.9%)
 Minimum / Average / Maximum Object : 0.02K / 0.03K / 4096.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
524564992 524564706   1%    0.03K 4683616      112  18734464K size-32
827460 826304  99%    0.19K  41373       20    165492K size-192
 89680  71633  79%    0.06K   1520       59      6080K size-64
 41588  19255  46%    0.10K   1124       37      4496K buffer_head
 34060  33994  99%    0.19K   1703       20      6812K dentry
 31680  31011  97%    0.12K   1056       30      4224K size-128
 22736  16710  73%    0.55K   3248        7     12992K radix_tree_node
 21892  20992  95%    1.00K   5473        4     21892K size-1024
 19980  19908  99%    0.14K    740       27      2960K sysfs_dir_cache
 18022  18022 100%    8.00K  18022        1    144176K size-8192
 11968  11862  99%    0.50K   1496        8      5984K size-512
  8745   8342  95%    0.07K    165       53       660K selinux_inode_security
  7260   1743  24%    0.12K    242       30       968K interval_node
  6853   1836  26%    0.05K     89       77       356K anon_vma_chain
  5172   5167  99%    0.58K    862        6      3448K inode_cache
  4901   4901 100%    4.00K   4901        1     19604K size-4096
  4821   4798  99%    1.02K   1607        3      6428K nfs_inode_cache
  4760   4739  99%    2.00K   2380        2      9520K size-2048
  4435   4410  99%    0.78K    887        5      3548K shmem_inode_cache
  4293   3732  86%    0.07K     81       53       324K Acpi-Operand
  4140   1371  33%    0.04K     45       92       180K anon_vma
  3810   3222  84%    0.25K    254       15      1016K skbuff_head_cache
  3800   1970  51%    0.20K    200       19       800K vm_area_struct
  3672   1704  46%    0.31K    306       12      1224K ldlm_resources
  3630   3616  99%    0.62K    605        6      2420K proc_inode_cache
  3231   3231 100%   16.00K   3231        1     51696K size-16384
  3176   3088  97%    0.44K    397        8      1588K ib_mad
  2955   1957  66%    0.25K    197       15       788K size-256
  2856   1840  64%    0.50K    408        7      1632K ldlm_locks
  2466   2420  98%    1.03K    822        3      3288K ldiskfs_inode_cache
  2140    602  28%    0.19K    107       20       428K filp
  2080   1219  58%    0.19K    104       20       416K cred_jar
  2070   2048  98%    0.12K     69       30       276K ioat2
  2040   1525  74%    0.11K     60       34       240K ldiskfs_prealloc_space
  1326   1255  94%    0.11K     39       34       156K task_delay_info
  1320   1246  94%    0.12K     44       30       176K pid
  1242   1240  99%    2.59K    414        3      3312K task_struct
1232   1218  98%    1.00K    308        4      1232K signal_cache
  1218   1214  99%    2.06K    406        3      3248K sighand_cache
  1104    951  86%    0.04K     12       92        48K Acpi-Namespace
  1003    665  66%    0.06K     17       59        68K fs_cache
   768    722  94%    0.50K     96        8       384K task_xstate
   528    352  66%    0.08K     11       48        44K blkdev_ioc
   442     30   6%    0.11K     13       34        52K jbd2_journal_head
   404     14   3%    0.02K      2      202         8K jbd2_revoke_table
   392    380  96%    0.53K     56        7       224K idr_layer_cache
   288    256  88%    0.02K      2      144         8K dm_target_io
   276    256  92%    0.04K      3       92        12K dm_io
   242    242 100%   32.12K    242        1     15488K kmem_cache
   224     10   4%    0.03K      2      112         8K dnotify_struct
   222    118  53%    0.10K      6       37        24K vvp_session_kmem
   210    155  73%    0.69K     42        5       168K sock_inode_cache
   210    113  53%    0.26K     14       15        56K lov_thread_kmem
   189    144  76%    0.18K      9       21        36K ccc_session_kmem
   187     62  33%    0.69K     17       11       136K files_cache
   187    113  60%    0.33K     17       11        68K ccc_thread_kmem
   177     14   7%    0.06K      3       59        12K tcp_bind_bucket
   170     50  29%    0.11K      5       34        20K inotify_inode_mark_entry
   160    113  70%    0.45K     20        8        80K osc_thread_kmem
   160    113  70%    0.46K     20        8        80K vvp_thread_kmem
   150    127  84%    0.38K     15       10        60K lov_session_kmem
   140     70  50%    0.38K     14       10        56K ip_dst_cache
   135    126  93%    0.41K     15        9        60K osc_session_kmem
   133    133 100%   64.00K    133        1      8512K size-65536
   120    104  86%    0.25K      8       15        32K arp_cache
   118     24  20%    0.06K      2       59         8K fib6_nodes
   112      2   1%    0.03K      1      112         4K sd_ext_cdb
   106     20  18%    0.07K      2       53         8K ip_fib_hash
   106      8   7%    0.07K      2       53         8K fscache_cookie_jar
   105     61  58%    1.38K     21        5       168K mm_struct
   104    104 100%   32.00K    104        1      3328K size-32768
    99     74  74%    0.34K      9       11        36K blkdev_requests

proc/meminfo

MemTotal:       24468024 kB
MemFree:          873680 kB
Buffers:           77080 kB
Cached:          2271248 kB
SwapCached:            0 kB
Active:            93068 kB
Inactive:        2272304 kB
Active(anon):      21316 kB
Inactive(anon):    28540 kB
Active(file):      71752 kB
Inactive(file):  2243764 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         17084 kB
Mapped:             4708 kB
Shmem:             32792 kB
Slab:           19260772 kB
SReclaimable:      40484 kB
SUnreclaim:     19220288 kB
KernelStack:        9904 kB
PageTables:         2284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12234012 kB
Committed_AS:     295960 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      573664 kB
VmallocChunk:   34345848776 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        5632 kB
DirectMap2M:     2082816 kB
DirectMap1G:    23068672 kB
Comment by Cliff White (Inactive) [ 28/Sep/12 ]

Okay, after some discussion i have a systemtap script to gather some more info, improvements welcome:

global malls

probe vm.kmalloc { 
        if (bytes_req < 64 ) {
        malls[ caller_function, bytes_req ]++ }
        }

probe timer.ms(10000)
{
        foreach ([caller,  bytes] in malls) {
        printf("caller: %s\t  bytes: %d\n", caller, bytes);
        }
        delete malls
}

The probe generates a bit too much overhead, so it frequently dies, but what i see look like this:

caller: cfs_alloc         bytes: 16
caller: 0xffffffff812417e3        bytes: 20
caller: cfs_alloc         bytes: 40
caller: cfs_alloc         bytes: 32
caller: 0xffffffffa0001b80        bytes: 16
caller: 0xffffffff8129401a        bytes: 32
caller: cfs_alloc         bytes: 24
caller: ldiskfs_ext_truncate      bytes: 56

again, this is just a quick hack, so please advise if there is a better way to get this data.

Comment by Andreas Dilger [ 29/Sep/12 ]

Some changes to the script:

global malls

probe vm.kmalloc() {
        if (bytes_req <= 32) {
                malls[caller_function, bytes_req]++
        }
}

probe timer.ms(1000000)
{
        foreach([caller, bytes] in malls) {
                if (malls[caller, bytes] > 1000) {
                        printf("caller: %s\tbytes: %d\tcount: %d\n",
                               caller, bytes, malls[calls, bytes]);
                }
        }
        delete malls
}

We only need to print allocations strictly <= 32, since larger allocations won't fit into the size-32 slab. Secondly, printing the actual count is important to know which one is the culprit. The changes are intended to avoid smaller allocators, and reduce output so that we get a better statistical sample, but the duration and 1000 threshold might need to be changed to get good data.

Comment by Peter Jones [ 01/Oct/12 ]

Duplicate of LU-2053

Generated at Sat Feb 10 01:21:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.