Details

    • 9223372036854775807

    Description

      This feature is a complement to the LU-5319 that implements the support of multiple modify RPCs in flight for MDC-MDT connection.

      It will improve the performance of modify metadata cross-MDT operations, while ensuring the correctness of recovery handling (request resend, and MDT recovery).

      Attachments

        1. mdtest-32kb-dom.png
          mdtest-32kb-dom.png
          9 kB
        2. mdtest-32kb-ost.png
          mdtest-32kb-ost.png
          9 kB
        3. mdtest-dir.png
          mdtest-dir.png
          8 kB
        4. mdtest-zero-size.png
          mdtest-zero-size.png
          8 kB

        Issue Links

          Activity

            [LU-6864] DNE3: Support multiple modify RPCs in flight for MDT-MDT connection

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47656/
            Subject: LU-6864 tests: properly skip sanity/245b in interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c4ebdc96061ae9c24ac471b2866f2087bc3e98d4

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47656/ Subject: LU-6864 tests: properly skip sanity/245b in interop Project: fs/lustre-release Branch: master Current Patch Set: Commit: c4ebdc96061ae9c24ac471b2866f2087bc3e98d4

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47656
            Subject: LU-6864 tests: properly skip sanity/245b in interop
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 91fa883754380b935b92d6e01fe5b1063483cc0c

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47656 Subject: LU-6864 tests: properly skip sanity/245b in interop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 91fa883754380b935b92d6e01fe5b1063483cc0c

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/14375/
            Subject: LU-6864 osp: manage number of modify RPCs in flight
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 23028efcae01bf1274a68fd2dd379fbb33300e82

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/14375/ Subject: LU-6864 osp: manage number of modify RPCs in flight Project: fs/lustre-release Branch: master Current Patch Set: Commit: 23028efcae01bf1274a68fd2dd379fbb33300e82

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46735
            Subject: LU-6864 osp: manage number of modify RPCs in flight
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: dce4b2800030f981b91249227daee40966ae4afd

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46735 Subject: LU-6864 osp: manage number of modify RPCs in flight Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: dce4b2800030f981b91249227daee40966ae4afd

            James, thanks for testing this out. The graphs would be much easier to compare if they had the before/after results for the same tests on a single graph.

            As for max_mod_rpcs_in_flight for the MDS, I think it is totally reasonable to increase this higher than 8, since we don't want clients to bottleneck on the MDSes if they are busy handling lots of client requests. The MDS has max_rpcs_in_flight=512, so it wouldn't be unreasonable to have the MDS tune this to match mds.MDS.mdt_out.threads_max.

            adilger Andreas Dilger added a comment - James, thanks for testing this out. The graphs would be much easier to compare if they had the before/after results for the same tests on a single graph. As for max_mod_rpcs_in_flight for the MDS, I think it is totally reasonable to increase this higher than 8, since we don't want clients to bottleneck on the MDSes if they are busy handling lots of client requests. The MDS has max_rpcs_in_flight=512 , so it wouldn't be unreasonable to have the MDS tune this to match mds.MDS.mdt_out.threads_max .

            https://www.opensfs.org/wp-content/uploads/Evaluation-of-DoM-SNE-scaling_Simmons_revised051821.pdf

            Looking at the data compared to my LUG talk since it is the same hardware we see improvements in file removals and stats. I do see a regression in reads of small files. I don't know if that is due to recent changes or the patch itself at this point. This is using the new default settings of max_mod_rpcs_in_flight = 8. I need to bump it up to see if I can get more out of it.

            simmonsja James A Simmons added a comment - https://www.opensfs.org/wp-content/uploads/Evaluation-of-DoM-SNE-scaling_Simmons_revised051821.pdf Looking at the data compared to my LUG talk since it is the same hardware we see improvements in file removals and stats. I do see a regression in reads of small files. I don't know if that is due to recent changes or the patch itself at this point. This is using the new default settings of max_mod_rpcs_in_flight = 8. I need to bump it up to see if I can get more out of it.

            The 14375 patch has been revived. Currently I'm collecting data to see if it helps with DNE2 scaling issues.

            simmonsja James A Simmons added a comment - The 14375 patch has been revived. Currently I'm collecting data to see if it helps with DNE2 scaling issues.

            People

              hongchao.zhang Hongchao Zhang
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: