[LU-4357] page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag Created: 06/Dec/13 Updated: 17/Mar/14 Resolved: 26/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB, mn4, patch | ||
| Environment: |
server: lustre-master build # 1791 RHEL6 ldiskf |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 11936 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/5efd6668-5e3c-11e3-ae30-52540035b04c. The sub-test test_iorssf failed with the following error:
Not sure if this is a dup of 12:02:03:Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 11:54:49 (1386273289) 12:02:03:ptlrpcd_0: page allocation failure. order:1, mode:0x40 12:02:03:IOR: page allocation failure. order:1, mode:0x40 12:02:03:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1 12:02:03:Call Trace: 12:02:03: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 12:02:03: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 12:02:03: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 12:02:03: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 12:02:03: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] 12:02:03: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220 12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc] 12:02:03: [<ffffffffa0967e98>] ? ptlrpc_prep_bulk_imp+0x58/0x190 [ptlrpc] 12:02:03: [<ffffffffa03ca054>] ? osc_brw_prep_request+0x294/0x11e0 [osc] 12:02:03: [<ffffffffa03de4ac>] ? osc_req_attr_set+0x16c/0x5b0 [osc] 12:02:04: [<ffffffffa088dd21>] ? cl_req_attr_set+0xd1/0x230 [obdclass] 12:02:04: [<ffffffffa03d01d0>] ? osc_build_rpc+0x870/0x1850 [osc] 12:02:04: [<ffffffffa03ea1a7>] ? osc_io_unplug0+0x1257/0x1f00 [osc] 12:02:04: [<ffffffffa0881285>] ? cl_page_slice_add+0x55/0x140 [obdclass] 12:02:04: [<ffffffffa03ecbd1>] ? osc_io_unplug+0x11/0x20 [osc] 12:02:04: [<ffffffffa03ecdb0>] ? osc_queue_sync_pages+0x1d0/0x360 [osc] 12:02:04: [<ffffffffa03def80>] ? osc_io_submit+0x340/0x4b0 [osc] 12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass] 12:02:04: [<ffffffffa04ba3c2>] ? lov_io_submit+0x3a2/0xbb0 [lov] 12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass] 12:02:04: [<ffffffffa088fc9e>] ? cl_io_read_page+0xae/0x170 [obdclass] 12:02:04: [<ffffffffa0883a77>] ? cl_page_assume+0xf7/0x220 [obdclass] 12:02:05: [<ffffffffa0768176>] ? ll_readpage+0x96/0x1a0 [lustre] 12:02:05: [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50 12:02:05: [<ffffffff8111b7ec>] ? generic_file_aio_read+0x1fc/0x700 12:02:05: [<ffffffffa0797797>] ? vvp_io_read_start+0x257/0x470 [lustre] 12:02:05: [<ffffffffa088d7ea>] ? cl_io_start+0x6a/0x140 [obdclass] 12:02:05: [<ffffffffa0891ef4>] ? cl_io_loop+0xb4/0x1b0 [obdclass] 12:02:05: [<ffffffffa0739f9f>] ? ll_file_io_generic+0x33f/0x610 [lustre] 12:02:05: [<ffffffffa073a3af>] ? ll_file_aio_read+0x13f/0x2c0 [lustre] 12:02:05: [<ffffffffa073ac4c>] ? ll_file_read+0x16c/0x2a0 [lustre] 12:02:05: [<ffffffff81181a95>] ? vfs_read+0xb5/0x1a0 12:02:06: [<ffffffff81181bd1>] ? sys_read+0x51/0x90 12:02:06: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 12:02:06: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b 12:02:06:Mem-Info: 12:02:06:Node 0 DMA per-cpu: 12:02:06:CPU 0: hi: 0, btch: 1 usd: 0 12:02:06:CPU 1: hi: 0, btch: 1 usd: 0 12:02:07:Node 0 DMA32 per-cpu: 12:02:07:CPU 0: hi: 186, btch: 31 usd: 0 12:02:07:CPU 1: hi: 186, btch: 31 usd: 4 12:02:07:active_anon:3193 inactive_anon:6004 isolated_anon:0 12:02:07: active_file:8566 inactive_file:234838 isolated_file:0 12:02:07: unevictable:0 dirty:1 writeback:0 unstable:0 12:02:07: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269 12:02:07: mapped:4138 shmem:39 pagetables:1255 bounce:0 12:02:07:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 12:02:07:lowmem_reserve[]: 0 2003 2003 2003 12:02:07:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 12:02:07:lowmem_reserve[]: 0 0 0 0 12:02:07:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB 12:02:08:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB 12:02:08:243500 total pagecache pages 12:02:08:0 pages in swap cache 12:02:08:Swap cache stats: add 0, delete 0, find 0/0 12:02:08:Free swap = 4128760kB 12:02:08:Total swap = 4128760kB 12:02:08:Pid: 11613, comm: ptlrpcd_0 Not tainted 2.6.32-358.18.1.el6.x86_64 #1 12:02:08:Call Trace: 12:02:08: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 12:02:08: [<ffffffffa0707a78>] ? ksocknal_queue_tx_msg_v2+0x88/0xe0 [ksocklnd] 12:02:08: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 12:02:08: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 12:02:09: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 12:02:09: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet] 12:02:09: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220 12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet] 12:02:09: [<ffffffffa096fac5>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc] 12:02:10: [<ffffffffa0970f22>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc] 12:02:10: [<ffffffff81281734>] ? snprintf+0x34/0x40 12:02:10: [<ffffffffa05c77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 12:02:10: [<ffffffffa0966654>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc] 12:02:10: [<ffffffffa096a3c8>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc] 12:02:10: [<ffffffffa099520b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc] 12:02:10: [<ffffffffa099572b>] ? ptlrpcd+0x20b/0x370 [ptlrpc] 12:02:10: [<ffffffff81063410>] ? default_wake_function+0x0/0x20 12:02:10: [<ffffffffa0995520>] ? ptlrpcd+0x0/0x370 [ptlrpc] 12:02:10: [<ffffffff81096a36>] ? kthread+0x96/0xa0 12:02:10: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 12:02:10: [<ffffffff810969a0>] ? kthread+0x0/0xa0 12:02:10: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 12:02:11:Mem-Info: 12:02:11:Node 0 DMA per-cpu: 12:02:11:CPU 0: hi: 0, btch: 1 usd: 0 12:02:11:CPU 1: hi: 0, btch: 1 usd: 0 12:02:12:Node 0 DMA32 per-cpu: 12:02:12:CPU 0: hi: 186, btch: 31 usd: 0 12:02:12:CPU 1: hi: 186, btch: 31 usd: 4 12:02:12:active_anon:3193 inactive_anon:6004 isolated_anon:0 12:02:12: active_file:8566 inactive_file:234838 isolated_file:0 12:02:13: unevictable:0 dirty:1 writeback:0 unstable:0 12:02:13: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269 12:02:13: mapped:4138 shmem:39 pagetables:1255 bounce:0 12:02:13:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 12:02:13:lowmem_reserve[]: 0 2003 2003 2003 12:02:13:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 12:02:13:lowmem_reserve[]: 0 0 0 0 12:02:13:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB 12:02:14:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB 12:02:14:243500 total pagecache pages 12:02:14:0 pages in swap cache 12:02:14:Swap cache stats: add 0, delete 0, find 0/0 12:02:14:Free swap = 4128760kB 12:02:15:Total swap = 4128760kB 12:02:15:524284 pages RAM 12:02:15:43709 pages reserved 12:02:15:259403 pages shared 12:02:16:221861 pages non-shared 12:02:16:LustreError: 22725:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12 12:02:16:LustreError: 22725:0:(osc_cache.c:2142:osc_check_rpcs()) Read request failed with -12 12:02:16:IOR: page allocation failure. order:1, mode:0x40 12:02:16:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1 12:02:16:Call Trace: 12:02:16: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 12:02:17: [<ffffffff81282796>] ? __const_udelay+0x46/0x50 12:02:17: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 12:02:17: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 12:02:17:524284 pages RAM 12:02:17:43709 pages reserved 12:02:17:259403 pages shared 12:02:17:221861 pages non-shared 12:02:18:LNetError: 11613:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-b2_5/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.5.0/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208) |
| Comments |
| Comment by Oleg Drokin [ 09/Dec/13 ] |
|
So if the servers would become writing faster (or cause clients to do it faster), then we'd run out of memory sooner due to uncommitted pages. I wonder if this is the issue here. |
| Comment by Phil Schwan (Inactive) [ 29/Dec/13 ] |
|
We're experiencing this regularly – at least 5 times, I believe – with our new 2.5.0 installation, both during reads and writes (the latter of which of course cause corruption, from the application's perspective, as the writes are asynchronous) Lustre: Lustre: Build Version: 2.5.0-RC1--PRISTINE-2.6.32-279.14.1.el6.x86_64 The similar-looking LR-3851 requests /proc/slabinfo and /proc/meminfo, although by the time we discover this happening it's usually far too late. |
| Comment by Jian Yu [ 05/Jan/14 ] |
|
Lustre client build: http://build.whamcloud.com/job/lustre-b2_5/5/ While vetting the test report of https://maloo.whamcloud.com/test_sets/516dbb40-74fa-11e3-95ae-52540035b04c , I found the following error messages in client syslog: Jan 3 05:42:11 wtm-24vm5 kernel: Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 05:42:11 (1388756531) Jan 3 05:42:11 wtm-24vm5 xinetd[1818]: EXIT: mshell status=0 pid=28818 duration=0(sec) Jan 3 05:42:11 wtm-24vm5 xinetd[1818]: START: shell pid=28842 from=::ffff:10.10.16.245 Jan 3 05:42:11 wtm-24vm5 xinetd[1818]: EXIT: shell status=0 pid=28842 duration=0(sec) Jan 3 05:42:15 wtm-24vm5 /usr/sbin/gmond[1834]: Error 1 sending the modular data for heartbeat#012 Jan 3 05:42:37 wtm-24vm5 kernel: ptlrpcd_1: page allocation failure. order:1, mode:0x40 Jan 3 05:42:37 wtm-24vm5 kernel: Pid: 2325, comm: ptlrpcd_1 Not tainted 2.6.32-358.18.1.el6.x86_64 #1 Jan 3 05:42:37 wtm-24vm5 kernel: Call Trace: Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81051439>] ? __wake_up_common+0x59/0x90 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0497ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0497ed7>] ? LNetMDAttach+0x157/0x5a0 [lnet] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa068dd85>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa068f1e2>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81281734>] ? snprintf+0x34/0x40 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa03e77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0684894>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa0688608>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b354b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b39e5>] ? ptlrpcd+0x185/0x370 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81063410>] ? default_wake_function+0x0/0x20 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffffa06b3860>] ? ptlrpcd+0x0/0x370 [ptlrpc] Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff81096a36>] ? kthread+0x96/0xa0 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff810969a0>] ? kthread+0x0/0xa0 Jan 3 05:42:37 wtm-24vm5 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 |
| Comment by Jian Yu [ 12/Jan/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/9/ parallel-scale test iorssf hit the same issue: |
| Comment by Jian Yu [ 17/Jan/14 ] |
|
Lustre client build: http://build.whamcloud.com/job/lustre-b2_5/13/ The same issue occurred: |
| Comment by Jian Yu [ 28/Jan/14 ] |
|
More instance on Lustre b2_5 branch: |
| Comment by Cory Spitz [ 11/Feb/14 ] |
|
2.5.0 has broken many memory allocators due to the cfs wrapper conversion that dropped __GFP_WAIT. Is there another LU ticket tracking that problem? Ann K. @Cray has authored a patch that we can upload. |
| Comment by Ann Koehler (Inactive) [ 11/Feb/14 ] |
|
Basic Problem: When the cfs wrappers around the memory allocation functions were dropped, the CFS_ALLOC_xxxx flags were replaced directly by matching __GFP_xxxx kernel flag. Thus CFS_ALLOC_IO was mapped directly to __GFP_IO. But if you look at the 2.4 code, you'll see that cfs_alloc_flags_to_gfp() - the function that used to map from CFS_ALLOC to __GFP - does a bit more than that. It adds __GFP_WAIT to all flags except CFS_ALLOC_ATOMIC. static unsigned int cfs_alloc_flags_to_gfp(u_int32_t flags)
{
unsigned int mflags = 0;
if (flags & CFS_ALLOC_ATOMIC)
mflags |= __GFP_HIGH;
else
mflags |= __GFP_WAIT;
if (flags & CFS_ALLOC_NOWARN)
mflags |= __GFP_NOWARN;
if (flags & CFS_ALLOC_IO)
mflags |= __GFP_IO;
if (flags & CFS_ALLOC_FS)
mflags |= __GFP_FS;
if (flags & CFS_ALLOC_HIGHMEM)
mflags |= __GFP_HIGHMEM;
return mflags;
}
The missing __GFP_WAIT flag causes memory allocations to fail in 2.5 when they seldom failed in earlier releases. One such failure is the one described in this ticket. Other ones we've seen are LBUGs/GPFs because a cl_env could not be allocated.
The fix is straightforward: add the __GFP_WAIT flag to the flag set passed to the kernel allocation functions for all cases except GFP_ATOMIC. In particular, I replaced __GFP_IO with GFP_NOFS. GFP_NOFS is defined as __GFP_IO | __GFP_WAIT. |
| Comment by Ann Koehler (Inactive) [ 11/Feb/14 ] |
|
Patch submitted: http://review.whamcloud.com/#/c/9223/ |
| Comment by Andreas Dilger [ 13/Feb/14 ] |
|
This was introduced in http://review.whamcloud.com/2831 commit 1b2547843817b4b7adbeb87ea9b070d9cac35c90. Tao, could you please make a version of this patch for the upstream kernel. |
| Comment by Peter Jones [ 18/Feb/14 ] |
|
Ann, Do you expect to be able to refresh the patch in the near future? If you have other commitments, would you mind if one of our engineers do so? Thanks Peter |
| Comment by Ann Koehler (Inactive) [ 18/Feb/14 ] |
|
Peter, I am pretty busy with other bugs at the moment. I have no objections at all to one of your engineers updating the patch. On the contrary, I would really appreciate it. Thanks, |
| Comment by Peter Jones [ 18/Feb/14 ] |
|
ok. Thanks for the quick feedback Ann. Emoly, Could you please revise Ann's patch to reflect the review feedback from Andreas? Thanks Peter |
| Comment by Bob Glossman (Inactive) [ 25/Feb/14 ] |
|
backport to b2_5: |
| Comment by Peter Jones [ 26/Feb/14 ] |
|
Landed for 2.5.1 and 2.6 |