[LU-983] On a 1.8.6 client/host, tar is performing 10K reads, and the RPCs are typically one page in size for small files==tar is slow for small files using 1.8.6 vs 1.8.5 Created: 11/Jan/12 Updated: 01/Jun/12 Resolved: 01/Jun/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.2, Lustre 1.8.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | James Karellas | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
cent 5.5/5.6 on lustre servers, sles11sp1 client (both ofed 1.5.2 and 1.5.3.1 have been tested on sles11sp1 clients) |
||
| Attachments: |
|
| Severity: | 3 |
| Epic: | client, performance |
| Rank (Obsolete): | 4698 |
| Description |
|
This is all with respect to small files. Large files seem to be ok. We have been focusing on tar, but suspect the problem is with small On a 1.8.5 host, tar is performing 10K reads, and the RPCs On a 1.8.6 host, tar is performing 10K reads, and the RPCs are typically We are having some problems with "collectl" on 1.8.6 filesystems or we'd As far as real world issues, we have several complaints form user that are Our users reported a serious performance issue in 1.8.6. The time Is there a known issue on tar'ing large number of small files in |
| Comments |
| Comment by Peter Jones [ 12/Jan/12 ] |
|
Lai Could you please look into this one? Thanks Peter |
| Comment by Andreas Dilger [ 12/Jan/12 ] |
|
At a first guess, it may be that max_read_ahead_whole_mb is being ignored for some reason. This should cause the whole file (for file size < max_read_ahead_whole_mb, default 2MB) to be read at the client. The other issue is why does tar only do 10kB reads? Lustre is reporting a 2MB blocksize to stat(2), so tar should be using this as the preferred IO size. I agree that this is not the core issue, since even a 10kB read should be triggering at least a 12kB RPC initially, and then readahead due to sequential access. We need to also investigate what patches exist in the LLNL 1.8.5 above stock Oracle 1.8.5, since I recall Brian Behlendorf working on the readahead code, and possibly this was not landed to 1.8.6? I think there is a bugzilla bug with their patch list. |
| Comment by James Karellas [ 12/Jan/12 ] |
|
As a data point, we tried an unpatched client version of 1.8.6 and 1.8.7 with no change in behavior. We are working on getting you a set of our own patchs that we've added to both 1.8.6 and 1.8.5. That will be uploaded in a bit. |
| Comment by Jay Lan (Inactive) [ 12/Jan/12 ] |
|
Attached are short form of git logs of two branches: oracle-1.8.5 and nas-1.8.5-5. The commits can be seen at The nas-1.8.5-5 branch was created on tag "1.8.5.0-5nas". The oracle-1.8.5 branch was created using "1.8.5" tag of b1_8 branch of oracle. |
| Comment by Jay Lan (Inactive) [ 12/Jan/12 ] |
|
Since the vanilla 1.8.6-wc1 build displayed the same rpc problem of tar as in the nas version of 1.8.6, I assume you do not need the diffs of 1.8.6. Let me know if you still want one. |
| Comment by Jay Lan (Inactive) [ 12/Jan/12 ] |
|
I just built a vanilla 1.8.5 lustre client (using tag "1.8.5" off b1_8 branch of oracle tree). It took 8min27sec. Not as good as ~6min of our 1.8.5-5nas, but was great comparing with ~120min as in original report. The rpc_stats showed 77% of 2 pages-per-rpc in read. read write [NOTE] I reran the test on Jan 13 with the same binary. The time was back to < 6min |
| Comment by Jay Lan (Inactive) [ 14/Jan/12 ] |
|
The offending commit is b7eb1d7: After reverting that commit, the small files tar performance is back. Most rpc's were using 128-page per rpc with our 1.8.6.81 client. snapshot_time: 1326582457.990047 (secs.usecs) read write read write read write |
| Comment by Peter Jones [ 17/Jan/12 ] |
|
Wangdi Could you please look at this report and see if you can understand why your patch from LU15 could be responsible for this behaviour? Thanks Peter |
| Comment by Di Wang [ 17/Jan/12 ] |
|
Are you running the tar with multiple threads? Could you please post the tar command line here? lctl set_param debug="+reada +vfstrace" And post the debug log here or somewhere I can get? |
| Comment by Di Wang [ 17/Jan/12 ] |
|
Could you please tell these three read ahead setting of your FS lctl get_param llite.*.max_read_ahead_whole_mb |
| Comment by Jay Lan (Inactive) [ 18/Jan/12 ] |
|
service33 /nobackupp6/jlan # lctl set_param debug="+reada +vfstrace" The test command I ran was:
I let it run for 1min 20min before terminated it. The debug output file is uploaded "wcdebug.output.gz". |
| Comment by Jay Lan (Inactive) [ 18/Jan/12 ] |
|
The debug output file in gz of running "tar -cf" command. |
| Comment by Di Wang [ 18/Jan/12 ] |
|
Thanks! I post a new patch here. http://review.whamcloud.com/#change,1983 |
| Comment by Jay Lan (Inactive) [ 18/Jan/12 ] |
|
Di, your patch worked for me.
real 5m18.394s The rpc read fell mostly on 128-page bin as expected. Thanks! |
| Comment by Kent Engström (Inactive) [ 07/Feb/12 ] |
|
An added datapoint. I just tried this patch while building Lustre to test a patch for the unrelated My "unpatched client" is: 116f41f (the My "patched client" is the same, but with 91776c2 cherry-picked on top of the other commits. I tested against a directory containing 10001 files, each 128 kilobytes in size. On the test client I ran tar, writing the tar file to local disk: With the unpatched client, this took 3m 35s (best of three attempts, worst was 3m 38s). |
| Comment by Kent Engström (Inactive) [ 16/Feb/12 ] |
|
Will this patch (http://review.whamcloud.com/#change,1983) be landed on the b1_8 branch? |
| Comment by Peter Jones [ 16/Feb/12 ] |
|
Yes it should land on b1_8 and master |
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 23/Feb/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 01/Jun/12 ] |
|
Landed for 1.8.8, 2.1.2 and 2.2 |