Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5289

mdc_enqueue() may leave an invalid lock handle in intent

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 3
    • 14756

    Description

      I see this running vanilla single node racer with memory allocation fault injection.

      [  169.793670] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_readers > 0 ) failed: 
      [  169.793681] LustreError: 8024:0:(ldlm_lock.c:852:ldlm_lock_decref_internal_nolock()) LBUG
      [  169.793687] Pid: 8024, comm: setfattr
      [  169.793690] 
      [  169.793691] Call Trace:
      [  169.793731]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  169.793757]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  169.793848]  [<ffffffffa0643842>] ldlm_lock_decref_internal_nolock+0xd2/0x180 [ptlrpc]
      [  169.793923]  [<ffffffffa0646d40>] ldlm_lock_decref_internal+0x50/0xae0 [ptlrpc]
      [  169.793993]  [<ffffffffa0438b7e>] ? class_handle2object+0x3e/0x1d0 [obdclass]
      [  169.794052]  [<ffffffffa06481b9>] ldlm_lock_decref+0x39/0x90 [ptlrpc]
      [  169.794088]  [<ffffffffa0e31b6f>] ll_intent_drop_lock+0xaf/0x150 [lustre]
      [  169.794113]  [<ffffffffa0e31c51>] ll_intent_release+0x41/0x1d0 [lustre]
      [  169.794150]  [<ffffffffa0e7e9c8>] ll_lookup_nd+0x108/0x4a0 [lustre]
      [  169.794158]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
      [  169.794163]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
      [  169.794168]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
      [  169.794172]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
      [  169.794177]  [<ffffffff811b4cc7>] user_path_at+0x57/0xa0
      [  169.794182]  [<ffffffff8119f6c3>] ? sys_close+0x43/0x120
      [  169.794187]  [<ffffffff8119f6c3>] ? sys_close+0x43/0x120
      [  169.794192]  [<ffffffff811cb418>] sys_setxattr+0x48/0xe0
      [  169.794200]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [  169.794203] 
      

      This was triggered by a allocation failure in mdc_enqueue_finish(). But the issue is at the bottom of mdc_enqueue():

              rc = mdc_finish_enqueue(exp, req, einfo, it, lockh, rc);
      	if (rc < 0) {
                      if (lustre_handle_is_used(lockh)) {
                              ldlm_lock_decref(lockh, einfo->ei_mode);
                              memset(lockh, 0, sizeof(*lockh));
                      }
                      ptlrpc_req_finished(req);
              }
              RETURN(rc);
      }
      

      We should clean it_lock_handle and it_lock_mode as well.

      More generally mdc_enqueue() should not have a *lockh parameter at all but to fix this we probably need split md_enqueue() into md_enqueue() and md_flock().

      Attachments

        Activity

          People

            jhammond John Hammond
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: