[LU-3598] Failure on test suite sanity-benchmark test_iozone: page allocation failure Created: 16/Jul/13 Updated: 13/Feb/14 Resolved: 13/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.5.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: lustre-master build #1560 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9130 | ||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/02803404-ed9b-11e2-8e3a-52540035b04c. The sub-test test_iozone failed with the following error:
client console: 12:52:41:Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone == 12:52:37 (1373831557) 12:52:41:Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1850592kB available, using 3844936kB file size 12:52:41:Lustre: DEBUG MARKER: min OST has 1850592kB available, using 3844936kB file size 12:54:03:ptlrpcd_1: page allocation failure. order:1, mode:0x40 12:54:03:Pid: 28900, comm: ptlrpcd_1 Not tainted 2.6.32-358.11.1.el6.x86_64 #1 12:54:03:Call Trace: 12:54:03: [<ffffffff8112c157>] ? __alloc_pages_nodemask+0x757/0x8d0 12:54:03: [<ffffffffa06627df>] ? ptlrpc_set_add_new_req+0xcf/0x150 [ptlrpc] 12:54:03: [<ffffffff81166b02>] ? kmem_getpages+0x62/0x170 12:54:03: [<ffffffff8116771a>] ? fallback_alloc+0x1ba/0x270 12:54:03: [<ffffffff8116716f>] ? cache_grow+0x2cf/0x320 12:54:03: [<ffffffff81167499>] ? ____cache_alloc_node+0x99/0x160 12:54:03: [<ffffffffa048fd57>] ? LNetMDAttach+0x157/0x5a0 [lnet] 12:54:03: [<ffffffff81168269>] ? __kmalloc+0x189/0x220 12:54:03: [<ffffffffa048fd57>] ? LNetMDAttach+0x157/0x5a0 [lnet] 12:54:03: [<ffffffffa0670b55>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc] 12:54:03: [<ffffffffa069ea48>] ? sptlrpc_import_sec_ref+0x28/0x50 [ptlrpc] 12:54:03: [<ffffffffa0671fa2>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc] 12:54:03: [<ffffffffa06a65b4>] ? sptlrpc_req_refresh_ctx+0x164/0x970 [ptlrpc] 12:54:03: [<ffffffffa0890971>] ? osc_io_unplug+0x11/0x20 [osc] 12:54:03: [<ffffffffa06676bb>] ? ptlrpc_send_new_req+0x45b/0x7a0 [ptlrpc] 12:54:03: [<ffffffff81055ab3>] ? __wake_up+0x53/0x70 12:54:03: [<ffffffffa066b3a8>] ? ptlrpc_check_set+0x878/0x1b20 [ptlrpc] 12:54:03: [<ffffffffa06976cb>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc] 12:54:03: [<ffffffff8109705c>] ? remove_wait_queue+0x3c/0x50 12:54:03: [<ffffffffa0697b50>] ? ptlrpcd+0x190/0x380 [ptlrpc] 12:54:03: [<ffffffff81063310>] ? default_wake_function+0x0/0x20 12:54:03: [<ffffffffa06979c0>] ? ptlrpcd+0x0/0x380 [ptlrpc] 12:54:03: [<ffffffff81096936>] ? kthread+0x96/0xa0 12:54:03: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 12:54:03: [<ffffffff810968a0>] ? kthread+0x0/0xa0 12:54:03: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 12:54:03:Mem-Info: 12:54:03:Node 0 DMA per-cpu: 12:54:03:CPU 0: hi: 0, btch: 1 usd: 0 12:54:03:Node 0 DMA32 per-cpu: 12:54:03:CPU 0: hi: 186, btch: 31 usd: 159 |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 16/Jul/13 ] |
|
The failure is not fatal is is how the rest of the code deals with it. 12:54:03:128789 pages non-shared 12:54:03:LNetError: 28900:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.52/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208) 12:54:03:LNetError: 28900:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 10894919 total bytes allocated by lnet 12:54:03:BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 12:54:03:IP: [<ffffffffa0670d5a>] ptlrpc_register_bulk+0x46a/0x9d0 [ptlrpc] 12:54:03:PGD 5e7df067 PUD 38cce067 PMD 0 12:54:03:Oops: 0000 [#1] SMP This looks to be the same as: |
| Comment by Keith Mannthey (Inactive) [ 16/Jul/13 ] |
|
Same Lnet error message. |
| Comment by Keith Mannthey (Inactive) [ 16/Jul/13 ] |
|
Lets just track this in the earlier LU. |
| Comment by Sarah Liu [ 27/Sep/13 ] |
|
Hit this error again in tag-2.4.93 testing 22:09:10:Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone == 22:09:03 (1380172143) 22:09:10:Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1850320kB available, using 3844920kB file size 22:09:11:Lustre: DEBUG MARKER: min OST has 1850320kB available, using 3844920kB file size 22:11:24:ptlrpcd_1: page allocation failure. order:1, mode:0x40 22:11:25:Pid: 2459, comm: ptlrpcd_1 Not tainted 2.6.32-358.18.1.el6.x86_64 #1 22:11:25:Call Trace: 22:11:26: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 22:11:27: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 22:11:27: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 22:11:27: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 22:11:27: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 22:11:28: [<ffffffffa048ced7>] ? LNetMDAttach+0x157/0x5a0 [lnet] 22:11:28: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220 22:11:28: [<ffffffffa048ced7>] ? LNetMDAttach+0x157/0x5a0 [lnet] 22:11:28: [<ffffffffa066dae5>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc] 22:11:29: [<ffffffffa069a608>] ? sptlrpc_import_sec_ref+0x28/0x50 [ptlrpc] 22:11:30: [<ffffffffa066ef42>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc] 22:11:30: [<ffffffffa06a1ebc>] ? sptlrpc_req_refresh_ctx+0x15c/0x930 [ptlrpc] 22:11:30: [<ffffffffa0664674>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc] 22:11:30: [<ffffffffa06683e8>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc] 22:11:31: [<ffffffffa069322b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc] 22:11:31: [<ffffffffa069374b>] ? ptlrpcd+0x20b/0x370 [ptlrpc] 22:11:32: [<ffffffff81063410>] ? default_wake_function+0x0/0x20 22:11:33: [<ffffffffa0693540>] ? ptlrpcd+0x0/0x370 [ptlrpc] 22:11:33: [<ffffffff81096a36>] ? kthread+0x96/0xa0 22:11:33: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 22:11:33: [<ffffffff810969a0>] ? kthread+0x0/0xa0 22:11:33: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 22:11:33:Mem-Info: 22:11:33:Node 0 DMA per-cpu: 22:11:34:CPU 0: hi: 0, btch: 1 usd: 0 22:11:34:Node 0 DMA32 per-cpu: 22:11:35:CPU 0: hi: 186, btch: 31 usd: 33 22:11:35:active_anon:6072 inactive_anon:6123 isolated_anon:0 22:11:35: active_file:29973 inactive_file:288526 isolated_file:0 22:11:35: unevictable:0 dirty:36357 writeback:16384 unstable:0 22:11:35: free:16750 slab_reclaimable:9827 slab_unreclaimable:110251 22:11:36: mapped:3569 shmem:38 pagetables:1027 bounce:0 22:11:36:Node 0 DMA free:8272kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5748kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:40kB slab_unreclaimable:1672kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 22:11:36:lowmem_reserve[]: 0 2003 2003 2003 22:11:37:Node 0 DMA32 free:58728kB min:44720kB low:55900kB high:67080kB active_anon:24288kB inactive_anon:24492kB active_file:119892kB inactive_file:1148356kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:145428kB writeback:65536kB mapped:14276kB shmem:152kB slab_reclaimable:39268kB slab_unreclaimable:439332kB kernel_stack:1648kB pagetables:4108kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 22:11:38:lowmem_reserve[]: 0 0 0 0 22:11:38:Node 0 DMA: 2*4kB 3*8kB 1*16kB 1*32kB 4*64kB 4*128kB 5*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8272kB 22:11:38:Node 0 DMA32: 13044*4kB 25*8kB 11*16kB 3*32kB 3*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 58728kB 22:11:38:318554 total pagecache pages 22:11:38:0 pages in swap cache 22:11:38:Swap cache stats: add 0, delete 0, find 0/0 22:11:38:Free swap = 4128760kB 22:11:39:Total swap = 4128760kB 22:11:39:524284 pages RAM 22:11:40:43669 pages reserved 22:11:40:342147 pages shared 22:11:41:154367 pages non-shared 22:11:41:LNetError: 2459:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.93/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208) 22:11:41:LNetError: 2459:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 10962695 total bytes allocated by lnet 22:11:41:LustreError: 2459:0:(niobuf.c:376:ptlrpc_register_bulk()) lustre-OST0004-osc-ffff88005dc12000: LNetMDAttach failed x1447210203467200/0: rc = -12 |
| Comment by Sarah Liu [ 01/Oct/13 ] |
|
similar error in SLES11 SP2 client https://maloo.whamcloud.com/test_sets/0194542a-2915-11e3-b598-52540035b04c |
| Comment by Jian Yu [ 09/Jan/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/9/ The same failure occurred: |
| Comment by Jian Yu [ 07/Feb/14 ] |
|
More instance on Lustre b2_5 branch: |
| Comment by Andreas Dilger [ 13/Feb/14 ] |
|
Also shows "mode:0x40" when allocation fails. |