[LU-10463] Poor write performance periodically on repeated test runs - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2
Labels:
None
Environment:
Centos 7.4, various Lustre and ZFS versions tested. Lustre clients are 2.10.2_RC2.

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

I'm running an IOR test (IOR-2.10.3) that writes 1GB files to one dataset/directory, then writes 3GB files to another dataset/directory, then reads back the first dataset. This test sequence is run 25 times. My filesystem is able to do 14-16GB/sec writes, and most iterations of this test will produce that bandwidth. Problem is that out of the 25 iterations, a couple/few of the test iterations turn in significantly lower results often in the 5-10GB/sec range.

I initially suspected hardware issues, but testing of components including each individual disk drive showed everything working properly, and I've seen nothing in the logs when running the test above reporting any problem. So, I started building and testing various combinations of Lustre and ZFS. The hardware, clients and server OS have been constant for each of the tests. Only SPL/ZFS and Lustre on the server have changed from test to test.

It appears to boil down to the problem having been introduced in the Lustre 2.10.x branch. I have not seen the problem occur in the Lustre 2.9 builds I've done. I've built Lustre 2.9 with ZFS 0.7.3 and seen no issue. I've build Lustre 2.10.x with ZFS 0.6.5.7 and do observe the issue. Every build I've done with Lustre 2.10.x (several) showed the issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ior-results-1Mvs4M.xlsx
05/Jan/18 7:37 PM
14 kB
Rick Gunlock

Issue Links

is related to

LU-10465 increase default stripe size to 4MB

Resolved

LU-18635 add non-page tunable max_mb_per_rpc_read/write parameters

In Progress

is related to

LU-9090 increase default RPC size to 4MB

Resolved

Activity

[LU-10463] Poor write performance periodically on repeated test runs

Gerrit Updater added a comment - 09/Feb/18 6:13 PM

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/
Subject: ~~LU-10463~~ osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

Gerrit Updater added a comment - 09/Feb/18 6:13 PM John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

Gerrit Updater added a comment - 22/Jan/18 3:28 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969
Subject: ~~LU-10463~~ osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60

Gerrit Updater added a comment - 22/Jan/18 3:28 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969 Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60

Peter Jones added a comment - 20/Jan/18 4:29 PM

Landed for 2.11

Peter Jones added a comment - 20/Jan/18 4:29 PM Landed for 2.11

Gerrit Updater added a comment - 20/Jan/18 6:19 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/
Subject: ~~LU-10463~~ osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

Gerrit Updater added a comment - 20/Jan/18 6:19 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/ Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: master Current Patch Set: Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

Gerrit Updater added a comment - 06/Jan/18 1:46 AM

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30757
Subject: ~~LU-10463~~ osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd

Gerrit Updater added a comment - 06/Jan/18 1:46 AM Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30757 Subject: LU-10463 osd-zfs: use 1MB RPC size by default Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd

Rick Gunlock (Inactive) added a comment - 05/Jan/18 7:39 PM

Looks like setting max_pages_per_rpc=1M has done the trick. I've tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn't see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware.

Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea.

Thanks for the prompt attention to this issue, and I'm happy that there is a simple solution. I've attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.

Rick Gunlock (Inactive) added a comment - 05/Jan/18 7:39 PM Looks like setting max_pages_per_rpc=1M has done the trick. I've tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn't see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware. Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea. Thanks for the prompt attention to this issue, and I'm happy that there is a simple solution. I've attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.

Andreas Dilger added a comment - 05/Jan/18 7:02 PM

After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree:

42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes
e711370e13dc LU-9448 lnet: handle empty CPTs
8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB
03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients

Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path. It would be possible to verify this by setting "lctl set_param osc.*.max_pages_per_rpc=1M" on the clients for a 2.9.59/2.10.0 client.

Andreas Dilger added a comment - 05/Jan/18 7:02 PM After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree: 42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes e711370e13dc LU-9448 lnet: handle empty CPTs 8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB 03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path. It would be possible to verify this by setting " lctl set_param osc.*.max_pages_per_rpc=1M " on the clients for a 2.9.59/2.10.0 client.

People

Assignee:: Peter Jones

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 05/Jan/18 6:24 PM

Updated:: 19/Feb/25 5:40 PM

Resolved:: 20/Jan/18 4:29 PM