Details

    • Technical task
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      The Data-on-MDT locking needs lock convert for IBITS lock to drop conflicting bits instead of lock cancel. The main functionality was done in the context of DoM and only for lock used for Data-on-MDT files.

      Meanwhile it can't be used for all other IBITS lock because of conflicts with ELC. The ELC requires changes to work with lock convert.

      Attachments

        Issue Links

          Activity

            [LU-10175] DoM:Full support for the LDLM lock convert

            Full lock convert is stuck on LU-5216 as I can see. Tests 33-36 in sanity-hsm.sh are failed and test 201 experiences deadlock. Lock convert doesn't change anything related to HSM but LDLM and it looks similar to what LU-5216 says about HSM locking problems.

            tappro Mikhail Pershin added a comment - Full lock convert is stuck on LU-5216 as I can see. Tests 33-36 in sanity-hsm.sh are failed and test 201 experiences deadlock. Lock convert doesn't change anything related to HSM but LDLM and it looks similar to what LU-5216 says about HSM locking problems.

            Full lock convert caused HSM deadlock issues and it seems common HSM problem described in LU-5216.

            tappro Mikhail Pershin added a comment - Full lock convert caused HSM deadlock issues and it seems common HSM problem described in LU-5216 .

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30491/
            Subject: LU-10175 ldlm: remove obsoleted lock convert code
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ebba68f378f72107fa51a8002369d1acef7dbedd

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30491/ Subject: LU-10175 ldlm: remove obsoleted lock convert code Project: fs/lustre-release Branch: master Current Patch Set: Commit: ebba68f378f72107fa51a8002369d1acef7dbedd

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30491
            Subject: LU-10175 ldlm: remove obsoleted lock convert code
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 19c844d546cbb227356f50bdb2444f1cfbd8fbaa

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30491 Subject: LU-10175 ldlm: remove obsoleted lock convert code Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 19c844d546cbb227356f50bdb2444f1cfbd8fbaa

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30202
            Subject: LU-10175 ldlm: IBITS lock convert instead of cancel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 522b25a09e57bec0453660898e374ca33e496fd5

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30202 Subject: LU-10175 ldlm: IBITS lock convert instead of cancel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 522b25a09e57bec0453660898e374ca33e496fd5
            tappro Mikhail Pershin added a comment - - edited

            commit IBITS lock convert patch from Data-on-MDT series to the master branch for testing purposes. It has no ELC workaround yet, just adapted to be used without DoM but with previous patch for selective lock trying.

            tappro Mikhail Pershin added a comment - - edited commit IBITS lock convert patch from Data-on-MDT series to the master branch for testing purposes. It has no ELC workaround yet, just adapted to be used without DoM but with previous patch for selective lock trying.

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25322
            Subject: LDEV-459 ldlm: IBITS lock convert instead of cancel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4625cd0b5dbd4f41280dc7fa8ac6c407d540bd91

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25322 Subject: LDEV-459 ldlm: IBITS lock convert instead of cancel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4625cd0b5dbd4f41280dc7fa8ac6c407d540bd91

            I've just pushed a patch for selective lock try, it is needed as part of DOM work but useful also alone for better IBITS combining. I'd add it prior lock convert code for less conflicts and for better lock convert utilization.

            tappro Mikhail Pershin added a comment - I've just pushed a patch for selective lock try, it is needed as part of DOM work but useful also alone for better IBITS combining. I'd add it prior lock convert code for less conflicts and for better lock convert utilization.

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25262
            Subject: LDEV-459 ldlm: selective IBITS lock trying
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 17e7d8556f71d7850cb0940ed98fa8a90ab450b3

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/25262 Subject: LDEV-459 ldlm: selective IBITS lock trying Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 17e7d8556f71d7850cb0940ed98fa8a90ab450b3

            yes, that is how lock convert works, but ELC cancels locks locally on client and then sends cancels to the server. So if we have lock with several bits plus UPDATE bit and ELC wants to cancel UPDATE bit, then such lock will be cancelled despite other bits. Without ELC, this lock would remain on client until conflict happens on server, then it will get blocking AST and lock_convert which remove only UPDATE bit and keep others. So ELC makes lock_convert useless in many scenarios, especially if there is UPDATE bit set among several others.

            I think ELC may use lock convert instead of local cancel and this will allow us to use lock convert for all ibits, not just for DOM.

            tappro Mikhail Pershin added a comment - yes, that is how lock convert works, but ELC cancels locks locally on client and then sends cancels to the server. So if we have lock with several bits plus UPDATE bit and ELC wants to cancel UPDATE bit, then such lock will be cancelled despite other bits. Without ELC, this lock would remain on client until conflict happens on server, then it will get blocking AST and lock_convert which remove only UPDATE bit and keep others. So ELC makes lock_convert useless in many scenarios, especially if there is UPDATE bit set among several others. I think ELC may use lock convert instead of local cancel and this will allow us to use lock convert for all ibits, not just for DOM.

            Is it enough to send the lock bits/extent in conflict with the AST to the client when the lock is contended and send the cancelled bits/extent to the server with the cancel? That way, the client can decide whether to cancel the whole lock (e.g. if idle for a long time), or just the conflicting bits (e.g. if actively in use). When the client sends an LDLM_CANCEL to the server, if it cancels all the bits/extent then the server drops the whole lock, otherwise it just drops the contending bits/extent, and leaves the same lock on the client.

            When a conflicting extent lock gets an AST, the client would need to decide which "end" of the lock should be cancelled so that it keeps a lock with a single contiguous extent, and not two locks with a hole.

            adilger Andreas Dilger added a comment - Is it enough to send the lock bits/extent in conflict with the AST to the client when the lock is contended and send the cancelled bits/extent to the server with the cancel? That way, the client can decide whether to cancel the whole lock (e.g. if idle for a long time), or just the conflicting bits (e.g. if actively in use). When the client sends an LDLM_CANCEL to the server, if it cancels all the bits/extent then the server drops the whole lock, otherwise it just drops the contending bits/extent, and leaves the same lock on the client. When a conflicting extent lock gets an AST, the client would need to decide which "end" of the lock should be cancelled so that it keeps a lock with a single contiguous extent, and not two locks with a hole.

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: