Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7875

Suspect locking cleanup in mdt_reint_unlink()

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      In mdt_reint_unlink we have suspect locking cleanup on error:

              rc = mdt_reint_object_lock(info, mc, child_lh, lock_ibits,
                                         cos_incompat);
              if (rc != 0)
                      GOTO(unlock_child, rc);
      ...
      unlock_child:
              mdt_unlock_slaves(info, mc, MDS_INODELOCK_UPDATE, s0_lh, s0_obj, einfo,
                                rc);
              mdt_object_unlock(info, mc, child_lh, rc);
      unlock_parent:
              mdt_object_unlock(info, mp, parent_lh, rc);
      put_child:
              mdt_object_put(info->mti_env, mc);
      put_parent:
              mdt_object_put(info->mti_env, mp);
      

      So it looks like we failed a child lock and then jus jump to unlock it anyway?
      need to make that goto to unlock_parent?
      Also need to check if the below lock_slaves call to unlock slaves would be fine too. Not even sure why the unlock order is reversed.

      Attachments

        Activity

          [LU-7875] Suspect locking cleanup in mdt_reint_unlink()
          laisiyao Lai Siyao added a comment - - edited

          This is not a bug IMO, because unlink locks in below order:
          1. lock parent UPDATE bit.
          2. if parent is remote, lock child LOOKUP bit on remote mdt, then lock child UPDATE bit on current mdt,
          else lock child LOOKUP | UPDATE bits on current mdt.
          3. if child is striped dir, lock all its slave stripes.

          So we can see if parent is remote, child inodebits are locked on different MDTs, so if the second lock fails, it needs to unlock the first lock.

          And slave locks should be unlocked first, mdt_unlock_slaves() is safe to call at any time, because if no slave locks taken, nothing will be done.

          It's a bit confusing that in step 2, if parent is remote, mdt_remote_object_lock() is called on parent object, but actually it's used to find the parent MDT, the real locked resource is child.

          laisiyao Lai Siyao added a comment - - edited This is not a bug IMO, because unlink locks in below order: 1. lock parent UPDATE bit. 2. if parent is remote, lock child LOOKUP bit on remote mdt, then lock child UPDATE bit on current mdt, else lock child LOOKUP | UPDATE bits on current mdt. 3. if child is striped dir, lock all its slave stripes. So we can see if parent is remote, child inodebits are locked on different MDTs, so if the second lock fails, it needs to unlock the first lock. And slave locks should be unlocked first, mdt_unlock_slaves() is safe to call at any time, because if no slave locks taken, nothing will be done. It's a bit confusing that in step 2, if parent is remote, mdt_remote_object_lock() is called on parent object, but actually it's used to find the parent MDT, the real locked resource is child.
          pjones Peter Jones added a comment -

          Lai

          Can you please look into this?

          Peter

          pjones Peter Jones added a comment - Lai Can you please look into this? Peter

          People

            laisiyao Lai Siyao
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: