Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2895

recovery-small 24a: osp_precreate_get_fid(): ASSERTION( lu_fid_diff(&d->opd_pre_used_fid, &d->opd_pre_last_created_fid) < 0 ) failed: next fid [0x0:0x1:0x0] last created fid [0x0:0x1:0x0]

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6979

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0c78ac7a-81ad-11e2-9f6b-52540035b04c.

      The sub-test test_24a failed with the following error:

      test failed to respond and timed out

      Info required for matching: recovery-small 24a

      First, the test failed with return code 1. Second, the MDS hit an assertion failure afterward:

      12:39:11:Lustre: DEBUG MARKER: == recovery-small test 24a: fsync error (should return error) == 12:39:10 (1361997550)
      12:39:36:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
      12:39:36:LustreError: 18701:0:(osp_precreate.c:1164:osp_precreate_get_fid()) ASSERTION( lu_fid_diff(&d->opd_pre_used_fid, &d->opd_pre_last_created_fid) < 0 ) failed: next fid [0x0:0x1:0x0] last created fid [0x0:0x1:0x0]
      12:39:36:LustreError: 18701:0:(osp_precreate.c:1164:osp_precreate_get_fid()) LBUG
      12:39:36:Pid: 18701, comm: mdt00_000
      12:39:36:
      12:39:37:Call Trace:
      12:39:37: [<ffffffffa06f1895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      12:39:37: [<ffffffffa06f1e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      12:39:37: [<ffffffffa0ba4f63>] osp_precreate_get_fid+0x253/0x310 [osp]
      12:39:37: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:39:37: [<ffffffffa0ba28cb>] osp_object_assign_fid+0x7b/0x160 [osp]
      12:39:37: [<ffffffffa0ba30b6>] osp_object_create+0x406/0x890 [osp]
      12:39:38: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:39:38: [<ffffffffa0b5e683>] lod_striping_create+0x193/0x320 [lod]
      12:39:38: [<ffffffffa0b5e983>] lod_object_create+0x173/0x260 [lod]
      12:39:38: [<ffffffffa0a1a580>] mdd_object_create_internal+0xe0/0x220 [mdd]
      12:39:38: [<ffffffffa0a2d22c>] mdd_create+0x77c/0x11f0 [mdd]
      12:39:38: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:39:38: [<ffffffffa0ac0ef9>] mdt_reint_open+0x12b9/0x1ab0 [mdt]
      12:39:39: [<ffffffffa07022e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      12:39:39: [<ffffffffa0aab7f1>] mdt_reint_rec+0x41/0xe0 [mdt]
      12:39:39: [<ffffffffa0aa4e53>] mdt_reint_internal+0x4e3/0x7d0 [mdt]
      12:39:39: [<ffffffffa0aa540d>] mdt_intent_reint+0x1ed/0x4f0 [mdt]
      12:39:39: [<ffffffffa0aa0fde>] mdt_intent_policy+0x3ae/0x750 [mdt]
      12:39:39: [<ffffffffa0fb0351>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
      12:39:39: [<ffffffffa0fd6447>] ldlm_handle_enqueue0+0x4f7/0x1080 [ptlrpc]
      12:39:39: [<ffffffffa0aa14b6>] mdt_enqueue+0x46/0x110 [mdt]
      12:39:39: [<ffffffffa0a96078>] mdt_handle_common+0x628/0x1620 [mdt]
      12:39:39: [<ffffffffa0acd845>] mds_regular_handle+0x15/0x20 [mdt]
      12:39:39: [<ffffffffa100802c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
      12:39:39: [<ffffffffa06f25de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      12:39:39: [<ffffffffa0fff759>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      12:39:39: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
      12:39:39: [<ffffffffa1009576>] ptlrpc_main+0xb76/0x1870 [ptlrpc]
      12:39:39: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      12:39:39: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      12:39:39: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      12:39:40: [<ffffffffa1008a00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
      12:39:40: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      12:39:40:
      12:39:40:Kernel panic - not syncing: LBUG
      

      Attachments

        Activity

          [LU-2895] recovery-small 24a: osp_precreate_get_fid(): ASSERTION( lu_fid_diff(&d->opd_pre_used_fid, &d->opd_pre_last_created_fid) < 0 ) failed: next fid [0x0:0x1:0x0] last created fid [0x0:0x1:0x0]
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4

          inability to setup FLD client or cleanup orphans should block any activity on this OSP, I'd think.

          bzzz Alex Zhuravlev added a comment - inability to setup FLD client or cleanup orphans should block any activity on this OSP, I'd think.

          Alex, the opd_pre_status was intialized as -EAGAIN, and it could be cleared to 0 in osp_precreate_thread() -> osp_statfs_update(), and the following osp_precreate_cleanup_orphans() wasn't success somehow, then we'll end up in above state: uninitialized opd_pre_last_created_fid & opd_pre_used_fid, opd_pre_status == 0, and osp_precreate_reserve() is possbile timedout with rc == 0.

          niu Niu Yawei (Inactive) added a comment - Alex, the opd_pre_status was intialized as -EAGAIN, and it could be cleared to 0 in osp_precreate_thread() -> osp_statfs_update(), and the following osp_precreate_cleanup_orphans() wasn't success somehow, then we'll end up in above state: uninitialized opd_pre_last_created_fid & opd_pre_used_fid, opd_pre_status == 0, and osp_precreate_reserve() is possbile timedout with rc == 0.

          why rc taken from is not enough ?

          bzzz Alex Zhuravlev added a comment - why rc taken from is not enough ?
          niu Niu Yawei (Inactive) added a comment - fix the LASSERT problem: http://review.whamcloud.com/5683

          Looks the osp_precreate_reserve() should timedout:

          12:39:36:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
          

          but the osp_precreate_reserve() didn't return proper error for such case:

                          if (cfs_time_aftereq(cfs_time_current(), expire))
                                  break;
          

          The mds debug log is missed, so I'm not sure the exact reason of precreation isn't done (even the seq hasn't been initialized), but at least, we should return proper error in such case.

          niu Niu Yawei (Inactive) added a comment - Looks the osp_precreate_reserve() should timedout: 12:39:36:Lustre: lustre-OST0000-osc-MDT0000: slow creates, last=[0x0:0x1:0x0], next=[0x0:0x1:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0 but the osp_precreate_reserve() didn't return proper error for such case: if (cfs_time_aftereq(cfs_time_current(), expire)) break ; The mds debug log is missed, so I'm not sure the exact reason of precreation isn't done (even the seq hasn't been initialized), but at least, we should return proper error in such case.
          pjones Peter Jones added a comment -

          Niu

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Niu Could you please look into this one? Thanks Peter

          People

            niu Niu Yawei (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: