[LU-2636] Interop 2.1.3<->2.4 failure on test suite sanityn test_18: client out of memory Created: 17/Jan/13  Updated: 26/Feb/13  Resolved: 26/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: LB
Environment:

server: 2.4
client: 2.1.3


Severity: 3
Rank (Obsolete): 6168

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/495572f2-5b58-11e2-b205-52540035b04c.

The sub-test test_18 failed with the following error:

test failed to respond and timed out

17:14:39:Lustre: DEBUG MARKER: == sanityn test 18: mmap sanity check =================================== 17:14:30 (1357348470)
17:14:39:mmap_sanity invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0, oom_score_adj=0
17:14:39:mmap_sanity cpuset=/ mems_allowed=0
17:14:39:Pid: 23247, comm: mmap_sanity Not tainted 2.6.32-279.2.1.el6.x86_64 #1
17:14:39:Call Trace:
17:14:39: [<ffffffff810c4aa1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
17:14:39: [<ffffffff81117210>] ? dump_header+0x90/0x1b0
17:14:39: [<ffffffff8121482c>] ? security_real_capable_noaudit+0x3c/0x70
17:14:40: [<ffffffff81117692>] ? oom_kill_process+0x82/0x2a0
17:14:41: [<ffffffff811175d1>] ? select_bad_process+0xe1/0x120
17:14:41: [<ffffffff81117ad0>] ? out_of_memory+0x220/0x3c0
17:14:41: [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50
17:14:41: [<ffffffff81117d35>] ? pagefault_out_of_memory+0xc5/0x110
17:14:41: [<ffffffff81044082>] ? mm_fault_error+0xb2/0x1a0
17:14:41: [<ffffffff8104467b>] ? __do_page_fault+0x33b/0x480
17:14:41: [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0
17:14:41: [<ffffffff81500755>] ? page_fault+0x25/0x30
17:14:41: [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0
17:14:41: [<ffffffff81500755>] ? page_fault+0x25/0x30
17:14:41:Mem-Info:
17:14:41:Node 0 DMA per-cpu:
17:14:41:CPU    0: hi:    0, btch:   1 usd:   0
17:14:41:Node 0 DMA32 per-cpu:
17:14:41:CPU    0: hi:  186, btch:  31 usd: 152
17:14:42:active_anon:3651 inactive_anon:2924 isolated_anon:0
17:14:42: active_file:10648 inactive_file:56778 isolated_file:0
17:14:42: unevictable:0 dirty:38 writeback:9 unstable:0
17:14:42: free:355178 slab_reclaimable:8287 slab_unreclaimable:18868
17:14:42: mapped:1412 shmem:38 pagetables:986 bounce:0
17:14:42:Node 0 DMA free:15732kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
17:14:42:lowmem_reserve[]: 0 2003 2003 2003
17:14:42:Node 0 DMA32 free:1404980kB min:44720kB low:55900kB high:67080kB active_anon:14604kB inactive_anon:11696kB active_file:42592kB inactive_file:227112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:152kB writeback:36kB mapped:5648kB shmem:152kB slab_reclaimable:33148kB slab_unreclaimable:75472kB kernel_stack:1560kB pagetables:3944kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
17:14:42:lowmem_reserve[]: 0 0 0 0
17:14:42:Node 0 DMA: 1*4kB 2*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB
17:14:42:Node 0 DMA32: 4115*4kB 6270*8kB 3041*16kB 1447*32kB 618*64kB 83*128kB 19*256kB 241*512kB 188*1024kB 132*2048kB 147*4096kB = 1404972kB
17:14:42:67024 total pagecache pages
17:14:42:0 pages in swap cache
17:14:42:Swap cache stats: add 0, delete 0, find 0/0
17:14:42:Free swap  = 4128760kB
17:14:42:Total swap = 4128760kB
17:14:42:524284 pages RAM
17:14:42:43628 pages reserved
17:14:42:53282 pages shared
17:14:42:71106 pages non-shared
17:14:42:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
17:14:42:[  457]     0   457     2764      261   0     -17         -1000 udevd
17:14:42:[ 1161]     0  1161    62271      308   0       0             0 rsyslogd
17:14:42:[ 1203]    32  1203     4743      143   0       0             0 rpcbind
17:14:42:[ 1215]     0  1215    45434      298   0       0             0 sssd
17:14:42:[ 1218]     0  1218    51095      852   0       0             0 sssd_be
17:14:42:[ 1233]    29  1233     6354      204   0       0             0 rpc.statd
17:14:42:[ 1235]     0  1235    42948      445   0       0             0 sssd_nss
17:14:42:[ 1236]     0  1236    42892      271   0       0             0 sssd_pam
17:14:42:[ 5436]    81  5436     5868      117   0       0             0 dbus-daemon
17:14:42:[ 5469]     0  5469     1019       98   0       0             0 acpid
17:14:42:[ 5478]    68  5478     6774      375   0       0             0 hald
17:14:42:[ 5479]     0  5479     4525      167   0       0             0 hald-runner
17:14:42:[ 5508]     0  5508     5054      150   0       0             0 hald-addon-inpu
17:14:42:[ 5518]    68  5518     4450      157   0       0             0 hald-addon-acpi
17:14:42:[ 5537]     0  5537   151110      719   0       0             0 automount
17:14:42:[ 5578]     0  5578    26826       51   0       0             0 rpc.rquotad
17:14:42:[ 5582]     0  5582     5413      204   0       0             0 rpc.mountd
17:14:42:[ 5631]     0  5631     6290      102   0       0             0 rpc.idmapd
17:14:42:[ 5676]   498  5676    23941      219   0       0             0 munged
17:14:42:[ 5706]     0  5706    16017      234   0     -17         -1000 sshd
17:14:42:[ 5714]     0  5714     5522      165   0       0             0 xinetd
17:14:42:[ 5722]    38  5722     7004      276   0       0             0 ntpd
17:14:42:[ 5741]     0  5741    22182      516   0       0             0 sendmail
17:14:43:[ 5749]    51  5749    19529      447   0       0             0 sendmail
17:14:43:[ 5771]     0  5771    27015      114   0       0             0 abrt-dump-oops
17:14:43:[ 5779]     0  5779    29302      235   0       0             0 crond
17:14:43:[ 5790]     0  5790     5363       93   0       0             0 atd
17:14:43:[ 5818]     0  5818     1015       73   0       0             0 mingetty
17:14:43:[ 5820]     0  5820     1015       73   0       0             0 mingetty
17:14:43:[ 5822]     0  5822     1015       74   0       0             0 mingetty
17:14:43:[ 5824]     0  5824     1015       73   0       0             0 mingetty
17:14:43:[ 5826]     0  5826     1018       78   0       0             0 agetty
17:14:43:[ 5827]     0  5827     1015       73   0       0             0 mingetty
17:14:43:[ 5829]     0  5829     1015       74   0       0             0 mingetty
17:14:43:[ 5834]     0  5834     2763      268   0     -17         -1000 udevd
17:14:43:[ 5835]     0  5835     2763      233   0     -17         -1000 udevd
17:14:43:[ 5853]     0  5853     6915      165   0     -17         -1000 auditd
17:14:43:[ 7851]     0  7851    15351      189   0       0             0 in.rshd
17:14:43:[ 7852]     0  7852    14970      340   0       0             0 ssh
17:14:43:[ 7873]     0  7873    25492      383   0       0             0 sshd
17:14:43:[ 7875]     0  7875    26517      132   0       0             0 run_test.sh
17:14:43:[ 8086]     0  8086    27011      734   0       0             0 bash
17:14:43:[ 8792]     0  8792    27011      704   0       0             0 bash
17:14:43:[ 8793]     0  8793    25228      154   0       0             0 tee
17:14:43:[ 8942]     0  8942    27021      870   0       0             0 bash
17:14:43:[23094]     0 23094    27054      725   0       0             0 bash
17:14:43:[23095]     0 23095    25228      156   0       0             0 tee
17:14:43:[23241]     0 23241      985      119   0       0             0 mmap_sanity
17:14:43:[23243]     0 23243     1058       84   0       0             0 mmap_sanity
17:14:43:[23245]     0 23245     1020       84   0       0             0 mmap_sanity
17:14:43:[23247]     0 23247      985       30   0       0             0 mmap_sanity
17:14:43:Out of memory: Kill process 1161 (rsyslogd) score 1 or sacrifice child
17:14:43:Killed process 1161, UID 0, (rsyslogd) total-vm:249084kB, anon-rss:676kB, file-rss:556kB
17:14:43:mmap_sanity invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0, oom_score_adj=0
17:14:43:mmap_sanity cpuset=/ mems_allowed=0


 Comments   
Comment by Oleg Drokin [ 21/Jan/13 ]

Node 0 DMA32 free:1404980kB
So we have tons of free pages, why did we fail the allocation then?

Comment by Jodi Levi (Inactive) [ 21/Jan/13 ]

Jinshan,
Can you have a look at this one and see comment if this is a blocker?

Comment by Peter Jones [ 26/Feb/13 ]

This has been corrected on more current 2.1.x releases

Generated at Sat Feb 10 01:26:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.