Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14642

transfer layout version to OST objects in layout change

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None

    Description

      There are cases that layout version has not been transferred to OST object after mirror extend/split/resync which makes following sync hang.

      OFD will compare the layout version from client with on-disk object's in ofd_verify_layout_version(), as the client's version increased with mirror extend/split/resync, the sync IO will keep loop with EINPROGRESS reply from the OFD.

       

      Attachments

        Issue Links

          Activity

            [LU-14642] transfer layout version to OST objects in layout change
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45443/
            Subject: LU-14642 flr: allow layout version update from client/MDS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fa6574150b6f745a668fe69b2d6d970068a4cff1

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45443/ Subject: LU-14642 flr: allow layout version update from client/MDS Project: fs/lustre-release Branch: master Current Patch Set: Commit: fa6574150b6f745a668fe69b2d6d970068a4cff1
            cfaber Colin Faber added a comment -

            bobijam 

            what's going on with this fix?

            cfaber Colin Faber added a comment - bobijam   what's going on with this fix?

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47567/
            Subject: LU-14642 tests: skip sanity-flr/100 for old servers
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9e25e70d78f3f5fdb7489f4f9841b0931927f10c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47567/ Subject: LU-14642 tests: skip sanity-flr/100 for old servers Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9e25e70d78f3f5fdb7489f4f9841b0931927f10c

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47567
            Subject: LU-14642 tests: skip sanity-flr/100 for old servers
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8c8cea505e046853dcebb6d414f54ce260fb9a88

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47567 Subject: LU-14642 tests: skip sanity-flr/100 for old servers Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8c8cea505e046853dcebb6d414f54ce260fb9a88

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43473/
            Subject: LU-14642 test: add fsx mirror file test mode
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43473/ Subject: LU-14642 test: add fsx mirror file test mode Project: fs/lustre-release Branch: master Current Patch Set: Commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

            "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/45443
            Subject: LU-14642 flr: abolish MDS transfer layout version to OST
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c9b87ca7ecb87ebfa22d5d3af736b2a42d603680

            gerrit Gerrit Updater added a comment - "Bobi Jam <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/45443 Subject: LU-14642 flr: abolish MDS transfer layout version to OST Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c9b87ca7ecb87ebfa22d5d3af736b2a42d603680

            Thanks for the update bobijam

            chunteraa Chris Hunter (Inactive) added a comment - Thanks for the update bobijam
            bobijam Zhenyu Xu added a comment -

            sound reasonable, OST only allows newer version write from the client and that does not need MDS's notification to OST about the layout version change.

            bobijam Zhenyu Xu added a comment - sound reasonable, OST only allows newer version write from the client and that does not need MDS's notification to OST about the layout version change.

            Bobijam, is there a reason why the MDS must update the layout version on the OST object before the client RPC can complete? If the client has a newer layout version than the OST, it would make sense that the client knows the new layout still includes the OST object, and the OST could just update the layout version directly.

            This is very different from the case where the client sends an RPC with old layout version and may need to update the layout in order to generate the correct RPC. That may mean the client is flushing old data without the layout lock being revoked, and it needs to contact the MDS to get the new file layout.

            Not only would this avoid the requirement for the MDS to send an urgent RPC to the OSS to avoid the client IO being blocked, but it seems like the MDS may not need to send any RPC at all, or at least not a distributed transaction RPC, if the client is still actively writing to the object. That would also avoid the race condition between the file layout being changed on the MDT and it being updated on all of the OST objects (which may be up to 2000 objects for a very wide striped layout). If a client sends an RPC with a new layout version to the OST first, then it can immediately update the object version and continue, unlike the current case of the client being blocked and waiting for a long time for the OST object version to be updated.

            adilger Andreas Dilger added a comment - Bobijam, is there a reason why the MDS must update the layout version on the OST object before the client RPC can complete? If the client has a newer layout version than the OST, it would make sense that the client knows the new layout still includes the OST object, and the OST could just update the layout version directly. This is very different from the case where the client sends an RPC with old layout version and may need to update the layout in order to generate the correct RPC. That may mean the client is flushing old data without the layout lock being revoked, and it needs to contact the MDS to get the new file layout. Not only would this avoid the requirement for the MDS to send an urgent RPC to the OSS to avoid the client IO being blocked, but it seems like the MDS may not need to send any RPC at all, or at least not a distributed transaction RPC, if the client is still actively writing to the object. That would also avoid the race condition between the file layout being changed on the MDT and it being updated on all of the OST objects (which may be up to 2000 objects for a very wide striped layout). If a client sends an RPC with a new layout version to the OST first, then it can immediately update the object version and continue, unlike the current case of the client being blocked and waiting for a long time for the OST object version to be updated.

            People

              bobijam Zhenyu Xu
              bobijam Zhenyu Xu
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: