Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4357

page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
    • server: lustre-master build # 1791 RHEL6 ldiskf
      client: 2.5.0
    • 3
    • 11936

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/5efd6668-5e3c-11e3-ae30-52540035b04c.

      The sub-test test_iorssf failed with the following error:

      ior failed! 1

      Not sure if this is a dup of LU-3585 while LU-3585 has been fixed. client console

      12:02:03:Lustre: DEBUG MARKER: == parallel-scale test iorssf: iorssf == 11:54:49 (1386273289)
      12:02:03:ptlrpcd_0: page allocation failure. order:1, mode:0x40
      12:02:03:IOR: page allocation failure. order:1, mode:0x40
      12:02:03:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1
      12:02:03:Call Trace:
      12:02:03: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
      12:02:03: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
      12:02:03: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
      12:02:03: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
      12:02:03: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
      12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc]
      12:02:03: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
      12:02:03: [<ffffffffa0967c08>] ? ptlrpc_new_bulk+0x48/0x280 [ptlrpc]
      12:02:03: [<ffffffffa0967e98>] ? ptlrpc_prep_bulk_imp+0x58/0x190 [ptlrpc]
      12:02:03: [<ffffffffa03ca054>] ? osc_brw_prep_request+0x294/0x11e0 [osc]
      12:02:03: [<ffffffffa03de4ac>] ? osc_req_attr_set+0x16c/0x5b0 [osc]
      12:02:04: [<ffffffffa088dd21>] ? cl_req_attr_set+0xd1/0x230 [obdclass]
      12:02:04: [<ffffffffa03d01d0>] ? osc_build_rpc+0x870/0x1850 [osc]
      12:02:04: [<ffffffffa03ea1a7>] ? osc_io_unplug0+0x1257/0x1f00 [osc]
      12:02:04: [<ffffffffa0881285>] ? cl_page_slice_add+0x55/0x140 [obdclass]
      12:02:04: [<ffffffffa03ecbd1>] ? osc_io_unplug+0x11/0x20 [osc]
      12:02:04: [<ffffffffa03ecdb0>] ? osc_queue_sync_pages+0x1d0/0x360 [osc]
      12:02:04: [<ffffffffa03def80>] ? osc_io_submit+0x340/0x4b0 [osc]
      12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass]
      12:02:04: [<ffffffffa04ba3c2>] ? lov_io_submit+0x3a2/0xbb0 [lov]
      12:02:04: [<ffffffffa088d68c>] ? cl_io_submit_rw+0x6c/0x160 [obdclass]
      12:02:04: [<ffffffffa088fc9e>] ? cl_io_read_page+0xae/0x170 [obdclass]
      12:02:04: [<ffffffffa0883a77>] ? cl_page_assume+0xf7/0x220 [obdclass]
      12:02:05: [<ffffffffa0768176>] ? ll_readpage+0x96/0x1a0 [lustre]
      12:02:05: [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
      12:02:05: [<ffffffff8111b7ec>] ? generic_file_aio_read+0x1fc/0x700
      12:02:05: [<ffffffffa0797797>] ? vvp_io_read_start+0x257/0x470 [lustre]
      12:02:05: [<ffffffffa088d7ea>] ? cl_io_start+0x6a/0x140 [obdclass]
      12:02:05: [<ffffffffa0891ef4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
      12:02:05: [<ffffffffa0739f9f>] ? ll_file_io_generic+0x33f/0x610 [lustre]
      12:02:05: [<ffffffffa073a3af>] ? ll_file_aio_read+0x13f/0x2c0 [lustre]
      12:02:05: [<ffffffffa073ac4c>] ? ll_file_read+0x16c/0x2a0 [lustre]
      12:02:05: [<ffffffff81181a95>] ? vfs_read+0xb5/0x1a0
      12:02:06: [<ffffffff81181bd1>] ? sys_read+0x51/0x90
      12:02:06: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
      12:02:06: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      12:02:06:Mem-Info:
      12:02:06:Node 0 DMA per-cpu:
      12:02:06:CPU    0: hi:    0, btch:   1 usd:   0
      12:02:06:CPU    1: hi:    0, btch:   1 usd:   0
      12:02:07:Node 0 DMA32 per-cpu:
      12:02:07:CPU    0: hi:  186, btch:  31 usd:   0
      12:02:07:CPU    1: hi:  186, btch:  31 usd:   4
      12:02:07:active_anon:3193 inactive_anon:6004 isolated_anon:0
      12:02:07: active_file:8566 inactive_file:234838 isolated_file:0
      12:02:07: unevictable:0 dirty:1 writeback:0 unstable:0
      12:02:07: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269
      12:02:07: mapped:4138 shmem:39 pagetables:1255 bounce:0
      12:02:07:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      12:02:07:lowmem_reserve[]: 0 2003 2003 2003
      12:02:07:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      12:02:07:lowmem_reserve[]: 0 0 0 0
      12:02:07:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB
      12:02:08:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB
      12:02:08:243500 total pagecache pages
      12:02:08:0 pages in swap cache
      12:02:08:Swap cache stats: add 0, delete 0, find 0/0
      12:02:08:Free swap  = 4128760kB
      12:02:08:Total swap = 4128760kB
      12:02:08:Pid: 11613, comm: ptlrpcd_0 Not tainted 2.6.32-358.18.1.el6.x86_64 #1
      12:02:08:Call Trace:
      12:02:08: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
      12:02:08: [<ffffffffa0707a78>] ? ksocknal_queue_tx_msg_v2+0x88/0xe0 [ksocklnd]
      12:02:08: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
      12:02:08: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
      12:02:09: [<ffffffff811673ff>] ? cache_grow+0x2cf/0x320
      12:02:09: [<ffffffff81167729>] ? ____cache_alloc_node+0x99/0x160
      12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
      12:02:09: [<ffffffff811684f9>] ? __kmalloc+0x189/0x220
      12:02:09: [<ffffffffa069aed7>] ? LNetMDAttach+0x157/0x5a0 [lnet]
      12:02:09: [<ffffffffa096fac5>] ? ptlrpc_register_bulk+0x265/0x9d0 [ptlrpc]
      12:02:10: [<ffffffffa0970f22>] ? ptl_send_rpc+0x232/0xc40 [ptlrpc]
      12:02:10: [<ffffffff81281734>] ? snprintf+0x34/0x40
      12:02:10: [<ffffffffa05c77b1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:02:10: [<ffffffffa0966654>] ? ptlrpc_send_new_req+0x454/0x790 [ptlrpc]
      12:02:10: [<ffffffffa096a3c8>] ? ptlrpc_check_set+0x888/0x1b40 [ptlrpc]
      12:02:10: [<ffffffffa099520b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
      12:02:10: [<ffffffffa099572b>] ? ptlrpcd+0x20b/0x370 [ptlrpc]
      12:02:10: [<ffffffff81063410>] ? default_wake_function+0x0/0x20
      12:02:10: [<ffffffffa0995520>] ? ptlrpcd+0x0/0x370 [ptlrpc]
      12:02:10: [<ffffffff81096a36>] ? kthread+0x96/0xa0
      12:02:10: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      12:02:10: [<ffffffff810969a0>] ? kthread+0x0/0xa0
      12:02:10: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      12:02:11:Mem-Info:
      12:02:11:Node 0 DMA per-cpu:
      12:02:11:CPU    0: hi:    0, btch:   1 usd:   0
      12:02:11:CPU    1: hi:    0, btch:   1 usd:   0
      12:02:12:Node 0 DMA32 per-cpu:
      12:02:12:CPU    0: hi:  186, btch:  31 usd:   0
      12:02:12:CPU    1: hi:  186, btch:  31 usd:   4
      12:02:12:active_anon:3193 inactive_anon:6004 isolated_anon:0
      12:02:12: active_file:8566 inactive_file:234838 isolated_file:0
      12:02:13: unevictable:0 dirty:1 writeback:0 unstable:0
      12:02:13: free:13356 slab_reclaimable:3430 slab_unreclaimable:197269
      12:02:13: mapped:4138 shmem:39 pagetables:1255 bounce:0
      12:02:13:Node 0 DMA free:8236kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:5824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:1620kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      12:02:13:lowmem_reserve[]: 0 2003 2003 2003
      12:02:13:Node 0 DMA32 free:45188kB min:44720kB low:55900kB high:67080kB active_anon:12772kB inactive_anon:24016kB active_file:34264kB inactive_file:933528kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:4kB writeback:0kB mapped:16552kB shmem:156kB slab_reclaimable:13668kB slab_unreclaimable:787456kB kernel_stack:1904kB pagetables:5020kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      12:02:13:lowmem_reserve[]: 0 0 0 0
      12:02:13:Node 0 DMA: 3*4kB 2*8kB 3*16kB 3*32kB 4*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8236kB
      12:02:14:Node 0 DMA32: 9673*4kB 6*8kB 23*16kB 2*32kB 2*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 45188kB
      12:02:14:243500 total pagecache pages
      12:02:14:0 pages in swap cache
      12:02:14:Swap cache stats: add 0, delete 0, find 0/0
      12:02:14:Free swap  = 4128760kB
      12:02:15:Total swap = 4128760kB
      12:02:15:524284 pages RAM
      12:02:15:43709 pages reserved
      12:02:15:259403 pages shared
      12:02:16:221861 pages non-shared
      12:02:16:LustreError: 22725:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12
      12:02:16:LustreError: 22725:0:(osc_cache.c:2142:osc_check_rpcs()) Read request failed with -12
      12:02:16:IOR: page allocation failure. order:1, mode:0x40
      12:02:16:Pid: 22725, comm: IOR Not tainted 2.6.32-358.18.1.el6.x86_64 #1
      12:02:16:Call Trace:
      12:02:16: [<ffffffff8112c257>] ? __alloc_pages_nodemask+0x757/0x8d0
      12:02:17: [<ffffffff81282796>] ? __const_udelay+0x46/0x50
      12:02:17: [<ffffffff81166d92>] ? kmem_getpages+0x62/0x170
      12:02:17: [<ffffffff811679aa>] ? fallback_alloc+0x1ba/0x270
      12:02:17:524284 pages RAM
      12:02:17:43709 pages reserved
      12:02:17:259403 pages shared
      12:02:17:221861 pages non-shared
      12:02:18:LNetError: 11613:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-b2_5/arch/x86_64/build_type/client/distro/el6/ib_stack/inkernel/BUILD/BUILD/lustre-2.5.0/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208)
      

      Attachments

        Issue Links

          Activity

            [LU-4357] page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag
            pjones Peter Jones added a comment -

            Landed for 2.5.1 and 2.6

            pjones Peter Jones added a comment - Landed for 2.5.1 and 2.6
            bogl Bob Glossman (Inactive) added a comment - backport to b2_5: http://review.whamcloud.com/9382
            pjones Peter Jones added a comment -

            ok. Thanks for the quick feedback Ann.

            Emoly,

            Could you please revise Ann's patch to reflect the review feedback from Andreas?

            Thanks

            Peter

            pjones Peter Jones added a comment - ok. Thanks for the quick feedback Ann. Emoly, Could you please revise Ann's patch to reflect the review feedback from Andreas? Thanks Peter

            Peter,

            I am pretty busy with other bugs at the moment. I have no objections at all to one of your engineers updating the patch. On the contrary, I would really appreciate it.

            Thanks,
            Ann

            amk Ann Koehler (Inactive) added a comment - Peter, I am pretty busy with other bugs at the moment. I have no objections at all to one of your engineers updating the patch. On the contrary, I would really appreciate it. Thanks, Ann
            pjones Peter Jones added a comment -

            Ann,

            Do you expect to be able to refresh the patch in the near future? If you have other commitments, would you mind if one of our engineers do so?

            Thanks

            Peter

            pjones Peter Jones added a comment - Ann, Do you expect to be able to refresh the patch in the near future? If you have other commitments, would you mind if one of our engineers do so? Thanks Peter

            This was introduced in http://review.whamcloud.com/2831 commit 1b2547843817b4b7adbeb87ea9b070d9cac35c90.

            Tao, could you please make a version of this patch for the upstream kernel.

            adilger Andreas Dilger added a comment - This was introduced in http://review.whamcloud.com/2831 commit 1b2547843817b4b7adbeb87ea9b070d9cac35c90. Tao, could you please make a version of this patch for the upstream kernel.
            amk Ann Koehler (Inactive) added a comment - Patch submitted: http://review.whamcloud.com/#/c/9223/

            Basic Problem: When the cfs wrappers around the memory allocation functions were dropped, the CFS_ALLOC_xxxx flags were replaced directly by matching __GFP_xxxx kernel flag. Thus CFS_ALLOC_IO was mapped directly to __GFP_IO. But if you look at the 2.4 code, you'll see that cfs_alloc_flags_to_gfp() - the function that used to map from CFS_ALLOC to __GFP - does a bit more than that. It adds __GFP_WAIT to all flags except CFS_ALLOC_ATOMIC.

            static unsigned int cfs_alloc_flags_to_gfp(u_int32_t flags)
            {
                    unsigned int mflags = 0;
            
                    if (flags & CFS_ALLOC_ATOMIC)
                            mflags |= __GFP_HIGH;
                    else
                            mflags |= __GFP_WAIT;
                    if (flags & CFS_ALLOC_NOWARN)
                            mflags |= __GFP_NOWARN;
                    if (flags & CFS_ALLOC_IO)
                            mflags |= __GFP_IO;
                    if (flags & CFS_ALLOC_FS)
                            mflags |= __GFP_FS;
                    if (flags & CFS_ALLOC_HIGHMEM)
                            mflags |= __GFP_HIGHMEM;
                    return mflags;
            }
            

            The missing __GFP_WAIT flag causes memory allocations to fail in 2.5 when they seldom failed in earlier releases. One such failure is the one described in this ticket. Other ones we've seen are LBUGs/GPFs because a cl_env could not be allocated.

            • LustreError: 11138:0:(osc_lock.c:599:osc_lock_upcall()) LBUG
            • LustreError: 25614:0:(osc_lock.c:781:osc_ldlm_blocking_ast()) LBUG
            • LustreError: 8855:0:(lov_page.c:100:lov_page_own()) LBUG
            • BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
              IP: [<ffffffffa0549a07>] lu_context_key_get+0x17/0x60 [obdclass]

            The fix is straightforward: add the __GFP_WAIT flag to the flag set passed to the kernel allocation functions for all cases except GFP_ATOMIC. In particular, I replaced __GFP_IO with GFP_NOFS. GFP_NOFS is defined as __GFP_IO | __GFP_WAIT.

            amk Ann Koehler (Inactive) added a comment - Basic Problem: When the cfs wrappers around the memory allocation functions were dropped, the CFS_ALLOC_xxxx flags were replaced directly by matching __GFP_xxxx kernel flag. Thus CFS_ALLOC_IO was mapped directly to __GFP_IO. But if you look at the 2.4 code, you'll see that cfs_alloc_flags_to_gfp() - the function that used to map from CFS_ALLOC to __GFP - does a bit more than that. It adds __GFP_WAIT to all flags except CFS_ALLOC_ATOMIC. static unsigned int cfs_alloc_flags_to_gfp(u_int32_t flags) { unsigned int mflags = 0; if (flags & CFS_ALLOC_ATOMIC) mflags |= __GFP_HIGH; else mflags |= __GFP_WAIT; if (flags & CFS_ALLOC_NOWARN) mflags |= __GFP_NOWARN; if (flags & CFS_ALLOC_IO) mflags |= __GFP_IO; if (flags & CFS_ALLOC_FS) mflags |= __GFP_FS; if (flags & CFS_ALLOC_HIGHMEM) mflags |= __GFP_HIGHMEM; return mflags; } The missing __GFP_WAIT flag causes memory allocations to fail in 2.5 when they seldom failed in earlier releases. One such failure is the one described in this ticket. Other ones we've seen are LBUGs/GPFs because a cl_env could not be allocated. LustreError: 11138:0:(osc_lock.c:599:osc_lock_upcall()) LBUG LustreError: 25614:0:(osc_lock.c:781:osc_ldlm_blocking_ast()) LBUG LustreError: 8855:0:(lov_page.c:100:lov_page_own()) LBUG BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [<ffffffffa0549a07>] lu_context_key_get+0x17/0x60 [obdclass] The fix is straightforward: add the __GFP_WAIT flag to the flag set passed to the kernel allocation functions for all cases except GFP_ATOMIC. In particular, I replaced __GFP_IO with GFP_NOFS. GFP_NOFS is defined as __GFP_IO | __GFP_WAIT.
            spitzcor Cory Spitz added a comment - - edited

            2.5.0 has broken many memory allocators due to the cfs wrapper conversion that dropped __GFP_WAIT. Is there another LU ticket tracking that problem? Ann K. @Cray has authored a patch that we can upload.

            spitzcor Cory Spitz added a comment - - edited 2.5.0 has broken many memory allocators due to the cfs wrapper conversion that dropped __GFP_WAIT. Is there another LU ticket tracking that problem? Ann K. @Cray has authored a patch that we can upload.
            yujian Jian Yu added a comment - More instance on Lustre b2_5 branch: https://maloo.whamcloud.com/test_sets/ca3e6bd0-8799-11e3-8928-52540035b04c

            People

              emoly.liu Emoly Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: