[LU-10463] Poor write performance periodically on repeated test runs Created: 05/Jan/18  Updated: 09/Feb/18  Resolved: 20/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

Centos 7.4, various Lustre and ZFS versions tested. Lustre clients are 2.10.2_RC2.


Attachments: Microsoft Word ior-results-1Mvs4M.xlsx    
Issue Links:
Cloners
Related
is related to LU-9090 increase default RPC size to 4MB Resolved
is related to LU-10465 increase default stripe size to 4MB Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I'm running an IOR test (IOR-2.10.3) that writes 1GB files to one dataset/directory, then writes 3GB files to another dataset/directory, then reads back the first dataset. This test sequence is run 25 times. My filesystem is able to do 14-16GB/sec writes, and most iterations of this test will produce that bandwidth. Problem is that out of the 25 iterations, a couple/few of the test iterations turn in significantly lower results often in the 5-10GB/sec range.

I initially suspected hardware issues, but testing of components including each individual disk drive showed everything working properly, and I've seen nothing in the logs when running the test above reporting any problem. So, I started building and testing various combinations of Lustre and ZFS. The hardware, clients and server OS have been constant for each of the tests. Only SPL/ZFS and Lustre on the server have changed from test to test.

It appears to boil down to the problem having been introduced in the Lustre 2.10.x branch. I have not seen the problem occur in the Lustre 2.9 builds I've done. I've built Lustre 2.9 with ZFS 0.7.3 and seen no issue. I've build Lustre 2.10.x with ZFS 0.6.5.7 and do observe the issue. Every build I've done with Lustre 2.10.x (several) showed the issue.



 Comments   
Comment by Andreas Dilger [ 05/Jan/18 ]

After narrowing it down between 2.9.58 and 2.9.59, of the 87 patches possible candidate patches that affect the server (excluding ldiskfs) aree:

42bf19a573a5 LU-8703 libcfs: make tolerant to offline CPUs and empty NUMA nodes
e711370e13dc LU-9448 lnet: handle empty CPTs
8c9c1f59d99c LU-9090 ofd: increase default OST BRW size to 4MB
03f24e6f7864 LU-2049 grant: Fix grant interop with pre-GRANT_PARAM clients

Of those patches, it seems that 8c9c1f59d99c is very likely the culprit for this, since it is the only patch that directly affects the IO path. It would be possible to verify this by setting "lctl set_param osc.*.max_pages_per_rpc=1M" on the clients for a 2.9.59/2.10.0 client.

Comment by Rick Gunlock [ 05/Jan/18 ]

Looks like setting max_pages_per_rpc=1M has done the trick. I've tested both 2.9.59 and 2.10.2 using ZFS 0.7.3 with consistent write results. I didn't see any significant performance degradation using this setting. Using 2.10.2, for writes I averaged 16,053 MiB/s with a spread of 15,070-16729 MiB/s across 80 test runs, which seems pretty typical for my hardware.

Short of figuring out how to get ZFS OSDs to take advantage of the larger default max_pages_per_rpc, I do think a patch to default to 1M for ZFS OSDs would be a good idea.

Thanks for the prompt attention to this issue, and I'm happy that there is a simple solution. I've attached a spreadsheet that shows my write results with default and 1M max_pages_per_rpc.

Comment by Gerrit Updater [ 06/Jan/18 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30757
Subject: LU-10463 osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6848d1ad26d00ade658e85e608c4a83a9a7747cd

Comment by Gerrit Updater [ 20/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30757/
Subject: LU-10463 osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: af34a876d2ebde2b4717c920683c7fc8b5eae1cf

Comment by Peter Jones [ 20/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 22/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30969
Subject: LU-10463 osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 79f3e1a4fa0ed94ee3958c955471d3ba67050a60

Comment by Gerrit Updater [ 09/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30969/
Subject: LU-10463 osd-zfs: use 1MB RPC size by default
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: f119ec3196eb3e7773eeb4dcb3d825d7f8725a9c

Generated at Sat Feb 10 02:35:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.