Details

    • Technical task
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.5.0
    • None
    • 17,895
    • 8423

    Description

      In order to submit the raid5-mmp-unplug-dev patch upstream, this needs to be updated for the latest kernel. Unfortunately, the affected code seems to have changed since the patch was written, so I'm not sure whether a simple "best guess update" of the patch will be correct.

      I'll attach my "best guess" patch, but it needs to be verified by someone who actually understands the MD RAID code better.

      Attachments

        Issue Links

          Activity

            [LU-3406] Submit raid5-mmp-unplug-dev patch upstream

            Patch http://review.whamcloud.com/6652 has been updated to latest master. If it proves stable we should look to pushing it upstream to get feedback to see what the final result is.

            simmonsja James A Simmons added a comment - Patch http://review.whamcloud.com/6652 has been updated to latest master. If it proves stable we should look to pushing it upstream to get feedback to see what the final result is.

            Oops, forgot to indicate patch is at http://review.whamcloud.com/6652.
            Also auto-tests never started for patch-set #1 for unexplained reasons, I submitted patch-set #2 with less restrictive test-parameters just in case.
            On the other hand I am working on a local test platform + use-cases to ensure MMP work fine over SW-Raid.

            bfaccini Bruno Faccini (Inactive) added a comment - Oops, forgot to indicate patch is at http://review.whamcloud.com/6652 . Also auto-tests never started for patch-set #1 for unexplained reasons, I submitted patch-set #2 with less restrictive test-parameters just in case. On the other hand I am working on a local test platform + use-cases to ensure MMP work fine over SW-Raid.

            I spent some time digging in latest/3.9.4 kernel sources and I can confirm there are still no way to bypass the MD/Raid5 stripe-cache upon a read request.
            I am 1st testing a patch (which introduces a new flag) against current Lustre-Server supported Kernel version and to be exposed under HA/mmp tests.

            bfaccini Bruno Faccini (Inactive) added a comment - I spent some time digging in latest/3.9.4 kernel sources and I can confirm there are still no way to bypass the MD/Raid5 stripe-cache upon a read request. I am 1st testing a patch (which introduces a new flag) against current Lustre-Server supported Kernel version and to be exposed under HA/mmp tests.

            BTW, within current supported Kernels and current patch version, my earlier comment applies to md_raid5_unplug_device() instead of md_wakeup_thread(). And it was already in original patch submitted for BZ #17895 by Jinshan, may be he can help me to confirm if it is necessary or not.

            bfaccini Bruno Faccini (Inactive) added a comment - BTW, within current supported Kernels and current patch version, my earlier comment applies to md_raid5_unplug_device() instead of md_wakeup_thread(). And it was already in original patch submitted for BZ #17895 by Jinshan, may be he can help me to confirm if it is necessary or not.

            Ok, so let's go for a new flag.
            But having a look to the MD/Raid5 source code I am now concerned about the real need for md_wakeup_thread() call at the end of make_request() if flag is set, seems to me that it should have been already called within release_stripe[_plug]() call and underlying routines. Thus, even if ineffective it could be considered as useless and costly.

            bfaccini Bruno Faccini (Inactive) added a comment - Ok, so let's go for a new flag. But having a look to the MD/Raid5 source code I am now concerned about the real need for md_wakeup_thread() call at the end of make_request() if flag is set, seems to me that it should have been already called within release_stripe [_plug] () call and underlying routines. Thus, even if ineffective it could be considered as useless and costly.

            I don't think _META is less used than _SYNC. It is often used for metadata IO requests to increase the priority in the scheduler. I think it is probably best to try with a new REQ flag and see what the upstream MD maintainer thinks. This is a generic bug with ext4 MMP, and not Lustre specific, so there should be some kind of solution possible.

            adilger Andreas Dilger added a comment - I don't think _META is less used than _SYNC. It is often used for metadata IO requests to increase the priority in the scheduler. I think it is probably best to try with a new REQ flag and see what the upstream MD maintainer thinks. This is a generic bug with ext4 MMP, and not Lustre specific, so there should be some kind of solution possible.

            Why not using _META flag for this purpose ?

            bfaccini Bruno Faccini (Inactive) added a comment - Why not using _META flag for this purpose ?

            James, a problem I've just noticed since I looked at this patch in detail for upstream submission is that overloading READ_SYNC/REQ_SYNC to mean "bypass MD RAID cache" is that this can negatively impact other filesystems such as Btrfs, OCFS, and others that now use READ_SYNC to distinguish reads that are being waited upon by processes from readahead requests.

            That is fine for Lustre server kernels, but isn't necessarily fine for general usage. One option is to make a separate REQ_NOCACHE flag or similar, but that needs a much bigger patch that is only useful for ext4 MMP.

            Hopefully someone who understands the RAID and block layer details better can come up with a solution.

            adilger Andreas Dilger added a comment - James, a problem I've just noticed since I looked at this patch in detail for upstream submission is that overloading READ_SYNC/REQ_SYNC to mean "bypass MD RAID cache" is that this can negatively impact other filesystems such as Btrfs, OCFS, and others that now use READ_SYNC to distinguish reads that are being waited upon by processes from readahead requests. That is fine for Lustre server kernels, but isn't necessarily fine for general usage. One option is to make a separate REQ_NOCACHE flag or similar, but that needs a much bigger patch that is only useful for ext4 MMP. Hopefully someone who understands the RAID and block layer details better can come up with a solution.

            Since this patch is more a bug fix it might even make it into 3.10. Time to send it to dm-devel@redhat.com

            simmonsja James A Simmons added a comment - Since this patch is more a bug fix it might even make it into 3.10. Time to send it to dm-devel@redhat.com

            Prototype patch against current Linux Git HEAD.

            adilger Andreas Dilger added a comment - Prototype patch against current Linux Git HEAD.

            People

              bfaccini Bruno Faccini (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: