[LU-7875] Suspect locking cleanup in mdt_reint_unlink() Created: 14/Mar/16  Updated: 14/Jan/18  Resolved: 14/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Lai Siyao
Resolution: Not a Bug Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In mdt_reint_unlink we have suspect locking cleanup on error:

        rc = mdt_reint_object_lock(info, mc, child_lh, lock_ibits,
                                   cos_incompat);
        if (rc != 0)
                GOTO(unlock_child, rc);
...
unlock_child:
        mdt_unlock_slaves(info, mc, MDS_INODELOCK_UPDATE, s0_lh, s0_obj, einfo,
                          rc);
        mdt_object_unlock(info, mc, child_lh, rc);
unlock_parent:
        mdt_object_unlock(info, mp, parent_lh, rc);
put_child:
        mdt_object_put(info->mti_env, mc);
put_parent:
        mdt_object_put(info->mti_env, mp);

So it looks like we failed a child lock and then jus jump to unlock it anyway?
need to make that goto to unlock_parent?
Also need to check if the below lock_slaves call to unlock slaves would be fine too. Not even sure why the unlock order is reversed.



 Comments   
Comment by Peter Jones [ 14/Dec/17 ]

Lai

Can you please look into this?

Peter

Comment by Lai Siyao [ 03/Jan/18 ]

This is not a bug IMO, because unlink locks in below order:
1. lock parent UPDATE bit.
2. if parent is remote, lock child LOOKUP bit on remote mdt, then lock child UPDATE bit on current mdt,
else lock child LOOKUP | UPDATE bits on current mdt.
3. if child is striped dir, lock all its slave stripes.

So we can see if parent is remote, child inodebits are locked on different MDTs, so if the second lock fails, it needs to unlock the first lock.

And slave locks should be unlocked first, mdt_unlock_slaves() is safe to call at any time, because if no slave locks taken, nothing will be done.

It's a bit confusing that in step 2, if parent is remote, mdt_remote_object_lock() is called on parent object, but actually it's used to find the parent MDT, the real locked resource is child.

Generated at Sat Feb 10 02:12:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.