Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2790

Failure to allocated osd keys leads to ofd_intent_policy()) ASSERTION( res_lvb != ((void *)0) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 6756

    Description

      After recent landings of LU-1431 patch amnd corresponding crop up of allocation failures due to LU-2748...

      [25420.342529] ll_ost00_005: page allocation failure. order:5, mode:0x50
      [25420.342845] Pid: 22594, comm: ll_ost00_005 Not tainted 2.6.32-debug #6
      [25420.343134] Call Trace:
      [25420.343356]  [<ffffffff81125bd6>] ? __alloc_pages_nodemask+0x976/0x9e0
      [25420.343652]  [<ffffffff81160a62>] ? kmem_getpages+0x62/0x170
      [25420.344062]  [<ffffffff8116349c>] ? fallback_alloc+0x1bc/0x270
      [25420.344431]  [<ffffffff81162db7>] ? cache_grow+0x4d7/0x520
      [25420.344748]  [<ffffffff81163188>] ? ____cache_alloc_node+0xa8/0x200
      [25420.345035]  [<ffffffff81163838>] ? __kmalloc+0x208/0x2a0
      [25420.345319]  [<ffffffffa09efc00>] ? cfs_alloc+0x30/0x60 [libcfs]
      [25420.345614]  [<ffffffffa09efc00>] ? cfs_alloc+0x30/0x60 [libcfs]
      [25420.345899]  [<ffffffffa048953e>] ? osd_key_init+0x1e/0x5d0 [osd_ldiskfs]
      [25420.346231]  [<ffffffffa0eae3df>] ? keys_fill+0x6f/0x190 [obdclass]
      [25420.346534]  [<ffffffffa0eb1e8b>] ? lu_context_init+0xab/0x260 [obdclass]
      [25420.346842]  [<ffffffffa0eb205e>] ? lu_env_init+0x1e/0x30 [obdclass]
      [25420.347134]  [<ffffffffa05bc90c>] ? ost_blocking_ast+0x5c/0xca0 [ost]
      [25420.347443]  [<ffffffffa10ebded>] ? ldlm_work_bl_ast_lock+0xdd/0x290 [ptlrpc]
      [25420.347770]  [<ffffffffa112c18f>] ? ptlrpc_set_wait+0x6f/0x880 [ptlrpc]
      [25420.348102]  [<ffffffff81090154>] ? __init_waitqueue_head+0x24/0x40
      [25420.348548]  [<ffffffffa09ef8a5>] ? cfs_waitq_init+0x15/0x20 [libcfs]
      [25420.348977]  [<ffffffffa112876e>] ? ptlrpc_prep_set+0x11e/0x300 [ptlrpc]
      [25420.349293]  [<ffffffffa10ebd10>] ? ldlm_work_bl_ast_lock+0x0/0x290 [ptlrpc]
      [25420.349796]  [<ffffffffa10ee19b>] ? ldlm_run_ast_work+0x1db/0x460 [ptlrpc]
      [25420.350126]  [<ffffffffa110580f>] ? ldlm_process_extent_lock+0x1af/0xa90 [ptlrpc]
      [25420.350606]  [<ffffffffa10ee7b4>] ? ldlm_lock_enqueue+0x394/0x870 [ptlrpc]
      [25420.350923]  [<ffffffffa1114e87>] ? ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [25420.351417]  [<ffffffffa1115a86>] ? ldlm_handle_enqueue+0x66/0x70 [ptlrpc]
      [25420.351749]  [<ffffffffa1115a90>] ? ldlm_server_completion_ast+0x0/0x640 [ptlrpc]
      [25420.352248]  [<ffffffffa05bc8b0>] ? ost_blocking_ast+0x0/0xca0 [ost]
      [25420.352574]  [<ffffffffa11123c0>] ? ldlm_server_glimpse_ast+0x0/0x3b0 [ptlrpc]
      [25420.353124]  [<ffffffffa05c4807>] ? ost_handle+0x1be7/0x4590 [ost]
      [25420.353543]  [<ffffffffa09fb204>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      [25420.353945]  [<ffffffffa1144e03>] ? ptlrpc_server_handle_request+0x453/0xe50 [ptlrpc]
      [25420.354432]  [<ffffffffa09ef65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [25420.354741]  [<ffffffffa113de91>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [25420.355023]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [25420.355299]  [<ffffffffa11478cd>] ? ptlrpc_main+0xafd/0x17f0 [ptlrpc]
      [25420.355606]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.355890]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      [25420.356188]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.356495]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.356787]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      ....
      [25420.500609] LustreError: 22594:0:(ldlm_resource.c:1161:ldlm_resource_get()) lvbo_init failed for resource 114: rc -12
      [25420.502383] LustreError: 18292:0:(ldlm_lock.c:1542:ldlm_fill_lvb()) ### Replied unexpected ost LVB size 0 ns: lustre-OST0000-osc-ffff88003f9d2bf0 lock: ffff880046658db0/0xd15dff8dc7742d63 lrc: 6/0,2 mode: --/PW res: 114/8589935616 rrc: 1 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 nid: local remote: 0xd15dff8dc77438ca expref: -99 pid: 20430 timeout: 0 lvb_type: 1
      ...
      [25420.604693] LustreError: 22594:0:(ldlm_resource.c:1161:ldlm_resource_get()) lvbo_init failed for resource 116: rc -12
      [25420.604777] LustreError: 18293:0:(ldlm_lock.c:1542:ldlm_fill_lvb()) ### Replied unexpected ost LVB size 0 ns: lustre-OST0000-osc-ffff880054e39bf0 lock: ffff880084978db0/0xd15dff8dc7744183 lrc: 6/0,2 mode: --/PW res: 116/8589935616 rrc: 1 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x0 nid: local remote: 0xd15dff8dc77442a9 expref: -99 pid: 20443 timeout: 0 lvb_type: 1
      [25420.620838]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [25420.621142]  [<ffffffffa11478cd>] ? ptlrpc_main+0xafd/0x17f0 [ptlrpc]
      [25420.621445]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.621760]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      [25420.622044]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.622339]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.622648]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      ...
      [25420.702106] LustreError: 22594:0:(ofd_dlm.c:177:ofd_intent_policy()) ASSERTION( res_lvb != ((void *)0) ) failed: 
      [25420.702490] LustreError: 22594:0:(ofd_dlm.c:177:ofd_intent_policy()) LBUG
      [25420.702705] Pid: 22594, comm: ll_ost00_005
      [25420.702853] 
      [25420.702853] Call Trace:
      [25420.703112]  [<ffffffffa09ee915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [25420.703314]  [<ffffffffa09eef17>] lbug_with_loc+0x47/0xb0 [libcfs]
      [25420.703492]  [<ffffffffa0697c85>] ofd_intent_policy+0x795/0x7c0 [ofd]
      [25420.703712]  [<ffffffffa10ee70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc]
      [25420.703906]  [<ffffffffa1114e87>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [25420.704121]  [<ffffffffa1115a86>] ldlm_handle_enqueue+0x66/0x70 [ptlrpc]
      [25420.704332]  [<ffffffffa1115a90>] ? ldlm_server_completion_ast+0x0/0x640 [ptlrpc]
      [25420.704667]  [<ffffffffa05bc8b0>] ? ost_blocking_ast+0x0/0xca0 [ost]
      [25420.704926]  [<ffffffffa11123c0>] ? ldlm_server_glimpse_ast+0x0/0x3b0 [ptlrpc]
      [25420.705271]  [<ffffffffa05c4807>] ost_handle+0x1be7/0x4590 [ost]
      [25420.705511]  [<ffffffffa09fb204>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      [25420.705715]  [<ffffffffa1144e03>] ptlrpc_server_handle_request+0x453/0xe50 [ptlrpc]
      [25420.706015]  [<ffffffffa09ef65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [25420.706213]  [<ffffffffa113de91>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [25420.706397]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [25420.706585]  [<ffffffffa11478cd>] ptlrpc_main+0xafd/0x17f0 [ptlrpc]
      [25420.706775]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.706956]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [25420.707232]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.707441]  [<ffffffffa1146dd0>] ? ptlrpc_main+0x0/0x17f0 [ptlrpc]
      [25420.707634]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [25420.707805] 
      [25420.708112] Kernel panic - not syncing: LBUG
      

      Attachments

        Issue Links

          Activity

            [LU-2790] Failure to allocated osd keys leads to ofd_intent_policy()) ASSERTION( res_lvb != ((void *)0) ) failed
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4

            Per discussions with Oleg, reducing priority to major.

            jlevi Jodi Levi (Inactive) added a comment - Per discussions with Oleg, reducing priority to major.

            This is the patch to handle lvbo_init() failure:

            http://review.whamcloud.com/#change,5699

            yong.fan nasf (Inactive) added a comment - This is the patch to handle lvbo_init() failure: http://review.whamcloud.com/#change,5699
            green Oleg Drokin added a comment -

            Well, now that hte failed caller is exposed, we just need to fix the caller to do something more sensible.

            But this is not a huge priority because it's not expected to really fit ever.

            green Oleg Drokin added a comment - Well, now that hte failed caller is exposed, we just need to fix the caller to do something more sensible. But this is not a huge priority because it's not expected to really fit ever.

            The failure occurred in ofd_lvbo_init() as following:

            ==========================
            OBD_ALLOC_PTR(lvb);
            if (lvb == NULL)
            GOTO(out, rc = -ENOMEM);
            ==========================

            The needed size for the LVB is just 56 bytes, very small.

            yong.fan nasf (Inactive) added a comment - The failure occurred in ofd_lvbo_init() as following: ========================== OBD_ALLOC_PTR(lvb); if (lvb == NULL) GOTO(out, rc = -ENOMEM); ========================== The needed size for the LVB is just 56 bytes, very small.

            this code was taken directly from obdfilter (which has the same assert) and it never was a problem. that said i don't mean the code is absolutely correct, but I don't think this will be a problem with ofd.

            bzzz Alex Zhuravlev added a comment - this code was taken directly from obdfilter (which has the same assert) and it never was a problem. that said i don't mean the code is absolutely correct, but I don't think this will be a problem with ofd.
            green Oleg Drokin added a comment -

            I disagree with Alex' assessment.

            LU-2748 only masked the symptoms here by making the original allocation more robust.
            But should it fail for other reasons, this bug will still occur.

            green Oleg Drokin added a comment - I disagree with Alex' assessment. LU-2748 only masked the symptoms here by making the original allocation more robust. But should it fail for other reasons, this bug will still occur.

            People

              yong.fan nasf (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: