Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.7.0
    • None
    • 17278

    Description

      Currently, for namespace LFSCK routine check without inconsistency repaired, the best bundle performance is under 4-MDTs configuration. As more MDTs join, the performance decreased. It is totally out of our expectation, should be resolved.

      Attachments

        Activity

          [LU-6177] LFSCK 4: namespace LFSCK scalability

          As the MDTs increased, the waiting time (as described above) increased also, so the aggregated performance does not scale as expected.

          yong.fan nasf (Inactive) added a comment - As the MDTs increased, the waiting time (as described above) increased also, so the aggregated performance does not scale as expected.

          even so, that should give us performance multiplied by (#MDTs-1), it shouldn't stop to scale?

          bzzz Alex Zhuravlev added a comment - even so, that should give us performance multiplied by (#MDTs-1), it shouldn't stop to scale?
          yong.fan nasf (Inactive) added a comment - - edited

          It should be, but unfortunately, because of the test script issue, the master MDT-object of striped directory is always created on MDT0, as to the objects count on the MDTs are not balance unexpectedly.

          On the other hand, we should not assume that every MDT has the same processing capability. We still need to adjust the performance calculating method.

          yong.fan nasf (Inactive) added a comment - - edited It should be, but unfortunately, because of the test script issue, the master MDT-object of striped directory is always created on MDT0, as to the objects count on the MDTs are not balance unexpectedly. On the other hand, we should not assume that every MDT has the same processing capability. We still need to adjust the performance calculating method.

          Shouldn't the number of files per MDT be about the same? Should the test config create balanced file creation? I thought the top-level directories are spread across all MDTs and then all the files are created in those directories?

          adilger Andreas Dilger added a comment - Shouldn't the number of files per MDT be about the same? Should the test config create balanced file creation? I thought the top-level directories are spread across all MDTs and then all the files are created in those directories?

          The main reason for the bad aggregated namespace LFSCK performance is that the performance calculating method is not suitable. After studying the test data, I found that it was always the MDT0 scanned more objects than the other MDTs. That caused the other MDTs had to wait the MDT0 to finish its first-stage scanning, then their performance became very slow because of the long time waiting for the MDT0.

          In fact, for each MDT, the real performance should be calculated as: the scanned objects is divided by the scanned time, not including the waiting time after the first-stage scanning. With such new calculating method, the real performance for each MDT is approximately equal. I will make patch for that and re-test the performance.

          yong.fan nasf (Inactive) added a comment - The main reason for the bad aggregated namespace LFSCK performance is that the performance calculating method is not suitable. After studying the test data, I found that it was always the MDT0 scanned more objects than the other MDTs. That caused the other MDTs had to wait the MDT0 to finish its first-stage scanning, then their performance became very slow because of the long time waiting for the MDT0. In fact, for each MDT, the real performance should be calculated as: the scanned objects is divided by the scanned time, not including the waiting time after the first-stage scanning. With such new calculating method, the real performance for each MDT is approximately equal. I will make patch for that and re-test the performance.

          I don't think it is only a matter of performance going down after 4 MDTs. The biggest issue is that aggregate performance isn't scaling at all when new MDTs are added. With only a small percentage of cross-MDT and hard-linked objects, most of the MDT namespace scanning should be local to the MDT and the aggregate scanning performance should scale almost linearly with the addition of each MDT.

          Since the performance was flat for 2-6 MDTs then either:

          • the performance results are actually per-MDT and not aggregate
          • there is some kind of bottleneck or too much communication between MDTs that is preventing scaling.
          adilger Andreas Dilger added a comment - I don't think it is only a matter of performance going down after 4 MDTs. The biggest issue is that aggregate performance isn't scaling at all when new MDTs are added. With only a small percentage of cross-MDT and hard-linked objects, most of the MDT namespace scanning should be local to the MDT and the aggregate scanning performance should scale almost linearly with the addition of each MDT. Since the performance was flat for 2-6 MDTs then either: the performance results are actually per-MDT and not aggregate there is some kind of bottleneck or too much communication between MDTs that is preventing scaling.

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: