Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6580

Poor read performance with many ptlrpcd threads on the client

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      Hi,

      With Lustre 2.7.0, we have noticed a very bad impact of the number of ptlrpcd threads on the client over the data read performance.
      If the number of ptlrpcd threads on the client is set to its default value which is the number of cores, then we have a drop in performance of almost 20%, compared to the case where we reduce the number of ptlrpcd threads to 1/3 of the number of cores.

      For instance see the results attached to this ticket. We use IOR with 24 tasks on a Lustre client node with 24 cores and one Infiniband FDR interface.

      What can explain this phenomenon in Lustre 2.7.0, as we do not see it with Lustre 2.5 and 2.6?
      Do you consider this is a bug, or a behavior change due to some modifications in Lustre 2.7 code (in that case we would like to learn more about)?

      Thanks in advance,
      Sebastien.

      Attachments

        Issue Links

          Activity

            [LU-6580] Poor read performance with many ptlrpcd threads on the client

            Hi,

            I gave a try to patch from LU-6325 at http://review.whamcloud.com/13972.

            I can see two interesting phenomena:

            • when not restricting ptlrpcd threads to specific cpts, the write performance is good and stable, whatever the number of ptlrpcd threads. But the read performance is very low, lower than without the patch. See graph ior_24tasks_lu6325_allcpt.png.
            • when restricting ptlrpcd threads to the cpt local to the network adapter, the write performance slightly decreases, but the read performance is much better, and does not drop when I increase the number of ptlrpcd threads. See ior_24tasks_lu6325_cpt0.png.

            These results are obtained with 'ptlrpcd_partner_group_size=1', because using the default value (2) gives slightly lower performance.

            So the ability to restrict ptlrpcd threads to a specific cpt (thanks to patch http://review.whamcloud.com/13972) is very helpful. But this brings a new question: why read performance is so poor when putting ptlrpcd threads on all cpts?

            Thanks,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi, I gave a try to patch from LU-6325 at http://review.whamcloud.com/13972 . I can see two interesting phenomena: when not restricting ptlrpcd threads to specific cpts, the write performance is good and stable, whatever the number of ptlrpcd threads. But the read performance is very low, lower than without the patch. See graph ior_24tasks_lu6325_allcpt.png. when restricting ptlrpcd threads to the cpt local to the network adapter, the write performance slightly decreases, but the read performance is much better, and does not drop when I increase the number of ptlrpcd threads. See ior_24tasks_lu6325_cpt0.png. These results are obtained with 'ptlrpcd_partner_group_size=1', because using the default value (2) gives slightly lower performance. So the ability to restrict ptlrpcd threads to a specific cpt (thanks to patch http://review.whamcloud.com/13972 ) is very helpful. But this brings a new question: why read performance is so poor when putting ptlrpcd threads on all cpts? Thanks, Sebastien.

            Can you please take a look at the patch in LU-6325, which improves the NUMA affinity of the ptlrpcd threads?

            That doesn't explain why there is a regression in 2.7.0, which still needs to be looked into, but may help fix the problem.

            adilger Andreas Dilger added a comment - Can you please take a look at the patch in LU-6325 , which improves the NUMA affinity of the ptlrpcd threads? That doesn't explain why there is a regression in 2.7.0, which still needs to be looked into, but may help fix the problem.

            People

              dmiter Dmitry Eremin (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: