Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10463

Poor write performance periodically on repeated test runs

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.11.0, Lustre 2.10.2
    • None
    • Centos 7.4, various Lustre and ZFS versions tested. Lustre clients are 2.10.2_RC2.
    • 3
    • 9223372036854775807

    Description

      I'm running an IOR test (IOR-2.10.3) that writes 1GB files to one dataset/directory, then writes 3GB files to another dataset/directory, then reads back the first dataset. This test sequence is run 25 times. My filesystem is able to do 14-16GB/sec writes, and most iterations of this test will produce that bandwidth. Problem is that out of the 25 iterations, a couple/few of the test iterations turn in significantly lower results often in the 5-10GB/sec range.

      I initially suspected hardware issues, but testing of components including each individual disk drive showed everything working properly, and I've seen nothing in the logs when running the test above reporting any problem. So, I started building and testing various combinations of Lustre and ZFS. The hardware, clients and server OS have been constant for each of the tests. Only SPL/ZFS and Lustre on the server have changed from test to test.

      It appears to boil down to the problem having been introduced in the Lustre 2.10.x branch. I have not seen the problem occur in the Lustre 2.9 builds I've done. I've built Lustre 2.9 with ZFS 0.7.3 and seen no issue. I've build Lustre 2.10.x with ZFS 0.6.5.7 and do observe the issue. Every build I've done with Lustre 2.10.x (several) showed the issue.

      Attachments

        Issue Links

          Activity

            [LU-10463] Poor write performance periodically on repeated test runs

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969 Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: master Current Patch Set: Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30757
            Subject: LU-10463 osd-zfs: use 1MB RPC size by default
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30757 Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd

            Looks like setting max_pages_per_rpc=1M has done the trick. I've tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn't see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware.

            Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea.

            Thanks for the prompt attention to this issue, and I'm happy that there is a simple solution. I've attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.

            rgunlock Rick Gunlock (Inactive) added a comment - Looks like setting max_pages_per_rpc=1M has done the trick. I've tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn't see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware. Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea. Thanks for the prompt attention to this issue, and I'm happy that there is a simple solution. I've attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.

            After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree:

            42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes
            e711370e13dc LU-9448 lnet: handle empty CPTs
            8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB
            03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients
            

            Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path. It would be possible to verify this by setting "lctl set_param osc.*.max_pages_per_rpc=1M" on the clients for a 2.9.59/2.10.0 client.

            adilger Andreas Dilger added a comment - After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree: 42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes e711370e13dc LU-9448 lnet: handle empty CPTs 8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB 03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path. It would be possible to verify this by setting " lctl set_param osc.*.max_pages_per_rpc=1M " on the clients for a 2.9.59/2.10.0 client.

            People

              pjones Peter Jones
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: