[LU-4033] Failure on test suite parallel-scale-nfsv4 test_iorssf: MDS oom - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.5.0
Labels:
None
Environment:
server and client: lustre-master build # 1687
client is running SLES11 SP2

Severity:
3
Rank (Obsolete):
10835

Description

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/960b8b64-2915-11e3-b598-52540035b04c.

The sub-test test_iorssf failed with the following error:

test failed to respond and timed out

MDS console

17:14:54:ptlrpcd_0: page allocation failure. order:1, mode:0x40
17:14:55:Pid: 2780, comm: ptlrpcd_0 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
17:14:56:Call Trace:
17:14:57: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
17:14:58: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
17:14:59: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
17:14:59: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
17:14:59: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
17:14:59: [<ffffffffa0538ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
17:14:59: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
17:14:59: [<ffffffffa0538ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
17:15:00: [<ffffffffa0771b35>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc]
17:15:00: [<ffffffffa0773a12>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc]
17:15:00: [<ffffffff81281b74>] ? snprintf+0x34/0x40
17:15:01: [<ffffffffa0488761>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
17:15:01: [<ffffffffa07685f4>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc]
17:15:02: [<ffffffffa076c368>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc]
17:15:02: [<ffffffffa079801b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
17:15:03: [<ffffffffa079853b>] ? ptlrpcd+0x20b/0x370 [ptlrpc]
17:15:03: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
17:15:03: [<ffffffffa0798330>] ? ptlrpcd+0x0/0x370 [ptlrpc]
17:15:03: [<ffffffff81096a36>] ? kthread+0x96/0xa0
17:15:03: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
17:15:04: [<ffffffff810969a0>] ? kthread+0x0/0xa0
17:15:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
17:15:06:Mem-Info:
17:15:06:Node 0 DMA per-cpu:
17:15:06:CPU    0: hi:    0, btch:   1 usd:   0
17:15:06:Node 0 DMA32 per-cpu:
17:15:06:CPU    0: hi:  186, btch:  31 usd:  42
17:15:06:active_anon:2345 inactive_anon:2732 isolated_anon:0
17:15:07: active_file:110430 inactive_file:238985 isolated_file:0
17:15:07: unevictable:0 dirty:3 writeback:0 unstable:0
17:15:07: free:14257 slab_reclaimable:7260 slab_unreclaimable:76976
17:15:07: mapped:2551 shmem:41 pagetables:794 bounce:0
17:15:08:Node 0 DMA free:8264kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:272kB inactive_file:5444kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:1700kB kernel_stack:16kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
17:15:08:lowmem_reserve[]: 0 2003 2003 2003
17:15:09:Node 0 DMA32 free:48764kB min:44720kB low:55900kB high:67080kB active_anon:9380kB inactive_anon:10928kB active_file:441448kB inactive_file:950496kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:12kB writeback:0kB mapped:10204kB shmem:164kB slab_reclaimable:29004kB slab_unreclaimable:306204kB kernel_stack:1984kB pagetables:3176kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
17:15:09:lowmem_reserve[]: 0 0 0 0
17:15:10:Node 0 DMA: 58*4kB 104*8kB 102*16kB 42*32kB 6*64kB 2*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 8264kB
17:15:11:Node 0 DMA32: 10659*4kB 2*8kB 2*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 48764kB
17:15:11:269122 total pagecache pages
17:15:11:28 pages in swap cache
17:15:11:Swap cache stats: add 62, delete 34, find 18/22
17:15:11:Free swap  = 4128648kB
17:15:12:Total swap = 4128760kB
17:15:12:524284 pages RAM
17:15:12:43669 pages reserved
17:15:13:282260 pages shared
17:15:13:194054 pages non-shared
17:15:14:LNetError: 2780:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/server/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.93/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208)
17:15:14:LNetError: 2780:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 55064047 total bytes allocated by lnet
17:15:15:LustreError: 2780:0:(niobuf.c:376:ptlrpc_register_bulk()) lustre-OST0002-osc-ffff88006f296400: LNetMDAttach failed x1447417177531472/0: rc = -12

Attachments

Issue Links

duplicates

LU-4357 page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag

Resolved

is related to

LU-4053 client leaking objects/locks during IO

Resolved

Failure on test suite parallel-scale-nfsv4 test_iorssf: MDS oom

Details

Description

Attachments

Issue Links

Activity

People

Dates