Data-on-MDT phase II (LU-10176)

[LU-10175] DoM:Full support for the LDLM lock convert Created: 04/Sep/16  Updated: 25/Aug/18  Resolved: 04/Jul/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Technical task Priority: Minor
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: DoM2

Issue Links:
Blocker
is blocked by LU-5216 HSM: restore doesn't restart for new ... Reopened
Related
is related to LU-9293 ocd_ibits_known is not checked Closed
is related to LU-3285 Data on MDT Resolved
is related to LU-11284 Full lock convert conflicts with HSM Open
is related to LU-11002 Interop 2.11 <-> 2.12 sanity-dom test... Resolved
is related to LU-9184 early patches for Data-on-MDT support Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The Data-on-MDT locking needs lock convert for IBITS lock to drop conflicting bits instead of lock cancel. The main functionality was done in the context of DoM and only for lock used for Data-on-MDT files.

Meanwhile it can't be used for all other IBITS lock because of conflicts with ELC. The ELC requires changes to work with lock convert.



 Comments   
Comment by Andreas Dilger [ 25/Jan/17 ]

Is it enough to send the lock bits/extent in conflict with the AST to the client when the lock is contended and send the cancelled bits/extent to the server with the cancel? That way, the client can decide whether to cancel the whole lock (e.g. if idle for a long time), or just the conflicting bits (e.g. if actively in use). When the client sends an LDLM_CANCEL to the server, if it cancels all the bits/extent then the server drops the whole lock, otherwise it just drops the contending bits/extent, and leaves the same lock on the client.

When a conflicting extent lock gets an AST, the client would need to decide which "end" of the lock should be cancelled so that it keeps a lock with a single contiguous extent, and not two locks with a hole.

Comment by Mikhail Pershin [ 27/Jan/17 ]

yes, that is how lock convert works, but ELC cancels locks locally on client and then sends cancels to the server. So if we have lock with several bits plus UPDATE bit and ELC wants to cancel UPDATE bit, then such lock will be cancelled despite other bits. Without ELC, this lock would remain on client until conflict happens on server, then it will get blocking AST and lock_convert which remove only UPDATE bit and keep others. So ELC makes lock_convert useless in many scenarios, especially if there is UPDATE bit set among several others.

I think ELC may use lock convert instead of local cancel and this will allow us to use lock convert for all ibits, not just for DOM.

Comment by Gerrit Updater [ 05/Feb/17 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25262
Subject: LDEV-459 ldlm: selective IBITS lock trying
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 17e7d8556f71d7850cb0940ed98fa8a90ab450b3

Comment by Mikhail Pershin [ 05/Feb/17 ]

I've just pushed a patch for selective lock try, it is needed as part of DOM work but useful also alone for better IBITS combining. I'd add it prior lock convert code for less conflicts and for better lock convert utilization.

Comment by Gerrit Updater [ 08/Feb/17 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25322
Subject: LDEV-459 ldlm: IBITS lock convert instead of cancel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4625cd0b5dbd4f41280dc7fa8ac6c407d540bd91

Comment by Mikhail Pershin [ 08/Feb/17 ]

commit IBITS lock convert patch from Data-on-MDT series to the master branch for testing purposes. It has no ELC workaround yet, just adapted to be used without DoM but with previous patch for selective lock trying.

Comment by Gerrit Updater [ 21/Nov/17 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30202
Subject: LU-10175 ldlm: IBITS lock convert instead of cancel
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 522b25a09e57bec0453660898e374ca33e496fd5

Comment by Gerrit Updater [ 12/Dec/17 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30491
Subject: LU-10175 ldlm: remove obsoleted lock convert code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 19c844d546cbb227356f50bdb2444f1cfbd8fbaa

Comment by Gerrit Updater [ 20/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30491/
Subject: LU-10175 ldlm: remove obsoleted lock convert code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ebba68f378f72107fa51a8002369d1acef7dbedd

Comment by Mikhail Pershin [ 21/Jan/18 ]

Full lock convert caused HSM deadlock issues and it seems common HSM problem described in LU-5216.

Comment by Mikhail Pershin [ 22/Jan/18 ]

Full lock convert is stuck on LU-5216 as I can see. Tests 33-36 in sanity-hsm.sh are failed and test 201 experiences deadlock. Lock convert doesn't change anything related to HSM but LDLM and it looks similar to what LU-5216 says about HSM locking problems.

Comment by Gerrit Updater [ 06/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30202/
Subject: LU-10175 ldlm: IBITS lock convert instead of cancel
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 37932c4beb9885593abe63c9e2dc1936648a0b49

Comment by Peter Jones [ 06/May/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 07/May/18 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32314
Subject: LU-10175 ldlm: handle lock converts in cancel handler
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 00baa72387638913cef71c3037f7894bf0811c6f

Comment by Joseph Gmitter (Inactive) [ 07/May/18 ]

Reopening as there are more patches to land under this umbrella

Comment by Gerrit Updater [ 31/May/18 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/32593
Subject: LU-10175 ptlrpc: add LOCK_CONVERT connection flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: efae969761c5dc27bd481f3f91119fd26a86a31f

Comment by Gerrit Updater [ 14/Jun/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32593/
Subject: LU-10175 ptlrpc: add LOCK_CONVERT connection flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 44a2092f08ca5f349659680b2c19d55d2365c842

Comment by Gerrit Updater [ 03/Jul/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32314/
Subject: LU-10175 ldlm: handle lock converts in cancel handler
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 541902a3f934d0e68368d9698cef38d44c473527

Comment by Joseph Gmitter (Inactive) [ 04/Jul/18 ]

Patches have landed for 2.12 for DoM locking.  Additional follow-on work will be created under a separate ticket.

Generated at Sat Feb 10 02:32:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.