Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10463

Poor write performance periodically on repeated test runs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.11.0, Lustre 2.10.2
    • None
    • Centos 7.4, various Lustre and ZFS versions tested. Lustre clients are 2.10.2_RC2.
    • 3
    • 9223372036854775807

    Description

      I'm running an IOR test (IOR-2.10.3) that writes 1GB files to one dataset/directory, then writes 3GB files to another dataset/directory, then reads back the first dataset. This test sequence is run 25 times. My filesystem is able to do 14-16GB/sec writes, and most iterations of this test will produce that bandwidth. Problem is that out of the 25 iterations, a couple/few of the test iterations turn in significantly lower results often in the 5-10GB/sec range.

      I initially suspected hardware issues, but testing of components including each individual disk drive showed everything working properly, and I've seen nothing in the logs when running the test above reporting any problem. So, I started building and testing various combinations of Lustre and ZFS. The hardware, clients and server OS have been constant for each of the tests. Only SPL/ZFS and Lustre on the server have changed from test to test.

      It appears to boil down to the problem having been introduced in the Lustre 2.10.x branch. I have not seen the problem occur in the Lustre 2.9 builds I've done. I've built Lustre 2.9 with ZFS 0.7.3 and seen no issue. I've build Lustre 2.10.x with ZFS 0.6.5.7 and do observe the issue. Every build I've done with Lustre 2.10.x (several) showed the issue.

      Attachments

        Issue Links

          Activity

            [LU-10463] Poor write performance periodically on repeated test runs
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18635 [ LU-18635 ]
            mdiep Minh Diep made changes -
            Labels Original: LTS

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c
            mdiep Minh Diep made changes -
            Fix Version/s New: Lustre 2.10.4 [ 13691 ]

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969 Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.11.0 [ 13091 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11
            pjones Peter Jones made changes -
            Labels New: LTS

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: master Current Patch Set: Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-10465 [ LU-10465 ]

            People

              pjones Peter Jones
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: