Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7958

Directory Operation Performance Degradation on DNE

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      We have been performing on going testing of DNE in Lustre 2.8. During testing, we have noticed that severe performance degradation occurs when directory operations involving 2 or more MDSs occur. We are using mdtest to assess performance of our setup. We are using directories which are stripped a varying number of MDSs (all have a starting index of 0) to vary the MDS count. Our setup contains 4 MDSs, 4 OSSs, and an independent MGS. Each server is backed by zfs and each server has a pool built with 4 solid state drives. Below is a summary of results from mdtest. I've attached the raw output just in case as well (Catalyst-mdtest-3-25-2016.txt).

      Directory creation:

      MDS Count Max Min Mean Std Dev
      1 5430.605 3665.080 4601.181 647.529
      2 140.306 108.522 126.983 11.541
      3 105.811 96.243 102.489 3.741
      4 109.556 103.074 106.963 2.482

      Directory removal:

      MDS Count Max Min Mean Std Dev
      1 2667.779 2126.894 2326.902 206.254
      2 163.166 86.190 107.369 32.263
      3 61.288 57.917 59.675 1.438
      4 81.529 62.300 69.171 7.785

      Directory stat:

      MDS Count Max Min Mean Std Dev
      1 53112.492 52023.002 52549.901 483.261
      2 20169.405 20068.291 20118.116 44.012
      3 18580.815 17621.591 18308.662 397.660
      4 16904.242 16836.548 16863.872 24.812

      We checked some output from testing back in October 2015. A similar trend of degradation is present, though the performance degradation did not appear to be as severe. For this test, we were running 4 MDSs each w/ 1 MDT, 2 OSSs each w/ 1 OST, and clients were running the same lustre version as the servers. Backend was zfs. I attached the logs from that test as well (Zwicky-mdtest-10-7-2015.txt).

      Let us know if you need any other information or more testing.

      Attachments

        Activity

          [LU-7958] Directory Operation Performance Degradation on DNE

          Hello Guiseppe,

          We are marking this as resolved/done, having provided information back to you.

          If you feel this ticket needs more work form us, please let us know.

          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - - edited Hello Guiseppe, We are marking this as resolved/done, having provided information back to you. If you feel this ticket needs more work form us, please let us know. Thanks, ~ jfc.

          Hello Guiseppe,

          Do you have what you need from this ticket? Or is there more work you would like us to do?

          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Hello Guiseppe, Do you have what you need from this ticket? Or is there more work you would like us to do? Thanks, ~ jfc.
          di.wang Di Wang added a comment -

          It looks like you set default striped EA on the top directory, then all of its children will be created as striped directory. Probably need to "fix" mdtest to do sth like Andreas's comments.

          di.wang Di Wang added a comment - It looks like you set default striped EA on the top directory, then all of its children will be created as striped directory. Probably need to "fix" mdtest to do sth like Andreas's comments.

          The intended usage for DNE is not to be creating many thousands of remote or striped directories at one time, but rather to create a striped directory and then create large numbers of files or subdirectories inside the striped directory.

          The creation of remote or striped directories themselves is relatively heavyweight because multiple MDTs are involved and distributed recovery, but regular create/unlink and mkdir/rmdir operations inside the remote or striped directory are independent of each other.

          adilger Andreas Dilger added a comment - The intended usage for DNE is not to be creating many thousands of remote or striped directories at one time, but rather to create a striped directory and then create large numbers of files or subdirectories inside the striped directory. The creation of remote or striped directories themselves is relatively heavyweight because multiple MDTs are involved and distributed recovery, but regular create/unlink and mkdir/rmdir operations inside the remote or striped directory are independent of each other.

          it looks like many mkdir's were distributed? this isn't a target workload for DNE2. such mkdir's aren't supposed to be very frequent, especially with ZFS.

          bzzz Alex Zhuravlev added a comment - it looks like many mkdir's were distributed? this isn't a target workload for DNE2. such mkdir's aren't supposed to be very frequent, especially with ZFS.

          People

            jfc John Fuchs-Chesney (Inactive)
            dinatale2 Giuseppe Di Natale (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: