[LU-10400] Reduced stat performance with lustre 2.10 Created: 15/Dec/17 Updated: 22/Mar/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Tim McMullan | Assignee: | Saurabh Tandan (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We have been noticing decreased performance in any stat-intense operation on lustre 2.10.[0,2] when compared to 2.7. The difference is more significant when testing on HDDs than when testing on SSDs, but is visible for us on both. Between runs I am dropping cache on the client, mds, and oss via "echo 3 > /proc/sys/vm/drop_caches" For example, in a single directory containing 100000 files:
We are running 2.10 server on centos7 and 2.7 on rhel6.6. |
| Comments |
| Comment by Peter Jones [ 19/Dec/17 ] |
|
Saraubh Can you please see whether you can reproduce these results? Thanks Peter |
| Comment by Andreas Dilger [ 19/Dec/17 ] |
|
Hi Tim, are there any tunable or formatting options that are used, or default file striping that is used at your site? We’d like to reproduce this locally to debug the problem, but want to make sure that what we are testing matches what you have. |
| Comment by Tim McMullan [ 20/Dec/17 ] |
|
Hey Andreas, |
| Comment by Saurabh Tandan (Inactive) [ 05/Jan/18 ] |
|
Hi Tim, |
| Comment by Allen Todd [ 05/Jan/18 ] |
|
The lustre 2.10.x system is running: CentOS Linux release 7.4.1708 (Core) Both filesystems are new builds in a lab with no preexisting data. |
| Comment by Saurabh Tandan (Inactive) [ 07/Mar/18 ] |
|
Tried to verify the performance drop between Lustre version 2.7.19.6 and 2.10.0 using the same kernel but I was not able to identify any huge delta between their performance numbers for file creation of 100000 files and later stat using 'time ls -l'. The numbers below are average of 3 runs for each. We will still continue to investigate further into this issue and see if we may identify anything. Build Version Real user sys b_ieel3_0 build 159 2.7.19.6 85.389 0.33 14.625 kernel-3.10.0-514.el7 b2_10 build 5 2.10.0 99.75 0.325 18.363 kernel-3.10.0-514.el7 time ls -l for touch Build Version Real user sys b_ieel3_0 build 159 2.7.19.6 4.444 0.835 2.098 kernel-3.10.0-514.el7 b2_10 build 5 2.10.0 3.848 0.824 2.338 kernel-3.10.0-514.el7 File creation using Mcreate: Build Version Real usr sys b_ieel3_0 build 159 2.7.19.6 183.02 38.133 137.687 kernel-3.10.0-514.el7 b2_10 build 5 2.10.0 196.111 38.003 152.28 kernel-3.10.0-514.el7 time ls -l for Mcreate: Build Version Real usr sys b_ieel3_0 build 159 2.7.19.6 3.266 0.76 1.464 kernel-3.10.0-514.el7 b2_10 build 5 2.10.0 3.27 0.738 1.782 kernel-3.10.0-514.el7 |
| Comment by Tim McMullan [ 22/Mar/18 ] |
|
Thanks for checking it out! After your test I decided to try running a test with the same lustre version on the el6 and 7 kernels. I ran this with lustre 2.8 on rhel6 and rhel7 since it happens to be easy with the released packages. The results are below, but times appear to be significantly different between the two. time ls -l Kernel real user sys 2.6.32-573.12.1.el6_lustre.x86_64 2.848 0.824 1.808 3.10.0-693.11.6.el7_lustre.x86_64 4.322 0.832 2.188 time du -s
Kernel real user sys 2.6.32-573.12.1.el6_lustre.x86_64 20.450 0.188 5.280 3.10.0-693.11.6.el7_lustre.x86_64 34.830 0.192 5.448 I'll keep looking and see what more I can come up with. Thanks!
|
| Comment by Patrick Farrell (Inactive) [ 22/Mar/18 ] |
|
Tim, That version of CentOS 7 includes the KPTI/Meltdown fix, and that version of CentOS 6 does not. That's a huge difference, and should account for the differences you're seeing, unless you've specifically disabled KPTI. |
| Comment by Tim McMullan [ 22/Mar/18 ] |
|
I'm sorry Patrick, my mistake. I grabbed some output for the wrong host... This is the run from 3.10.0-327.3.1.el7_lustre.x86_64 (packaged one for 2.8) time ls -l Kernel real user sys 2.6.32-573.12.1.el6_lustre.x86_64 2.848 0.824 1.808 3.10.0-327.3.1.el7_lustre.x86_64 3.391 0.820 1.876 time du -s Kernel real user sys 2.6.32-573.12.1.el6_lustre.x86_64 20.450 0.188 5.280 3.10.0-327.3.1.el7_lustre.x86_64 32.417 0.252 5.272
|