[LU-5608] Performance regression of removal operation with mdtest stride option Created: 11/Sep/14 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Wang Shilong (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 15687 |
| Description |
|
While comparing Lustre 1.8 series with latest master release in client. Server is running the same 2.5 series. we found there is big file removal regression. Testing command is:
While comparing file removal performance between 1.8 and master is: Big regression, isn't it? Notice here we need use '-N' option for mdtest, the problem seems only reproducible under multiple clients. Attachment is modified mdtest source codes which could help reproduce this problem. |
| Comments |
| Comment by Shuichi Ihara (Inactive) [ 12/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
Here is benchamrk results with master branch and 1.8.9 on clients. Server is running lustre-2.5. 32 clients, 64 processes No stride # mdtest -n 16384 -i 3 -p 10 -d /lustre_0/mdtest.out -F -u Stride=2, because, two mdtest threads are running on same client # mdtest -n 16384 -i 3 -p 10 -d /lustre_0/mdtest.out -F -u -N 2 No Stride
Stride=2
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Cory Spitz [ 12/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
This seems to be related (or a duplicate) of https://jira.hpdd.intel.com/browse/LU-1167 and https://jira.hpdd.intel.com/browse/LU-3308. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 13/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
I don't know what type of metadata workload LU-1167 and LU-3308 did, but I think this is different issue. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 13/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
lustre-1.8.9 dosn't have layout lock. So, just in case, in order to make sure if layout lock might be related, I applied following patches to force disable layout lock with 2.6.52 client. Index: lustre-release.git/lustre/llite/llite_lib.c =================================================================== --- lustre-release.git.orig/lustre/llite/llite_lib.c +++ lustre-release.git/lustre/llite/llite_lib.c @@ -211,7 +211,7 @@ static int client_common_fill_super(stru OBD_CONNECT_FULL20 | OBD_CONNECT_64BITHASH| OBD_CONNECT_EINPROGRESS | OBD_CONNECT_JOBSTATS | OBD_CONNECT_LVB_TYPE | - OBD_CONNECT_LAYOUTLOCK | OBD_CONNECT_PINGLESS | + OBD_CONNECT_PINGLESS | OBD_CONNECT_MAX_EASIZE | OBD_CONNECT_FLOCK_DEAD | OBD_CONNECT_DISP_STRIPE | OBD_CONNECT_LFSCK | @@ -416,7 +416,6 @@ static int client_common_fill_super(stru OBD_CONNECT_MAXBYTES | OBD_CONNECT_EINPROGRESS | OBD_CONNECT_JOBSTATS | OBD_CONNECT_LVB_TYPE | - OBD_CONNECT_LAYOUTLOCK | OBD_CONNECT_PINGLESS | OBD_CONNECT_LFSCK; if (sbi->ll_flags & LL_SBI_SOM_PREVIEW) Here is test results. 32 clients, 64 process, 1M files for creation/stats/removal. No Stride
Stride=2
Stride enabled "File removal" performance significant improved and it's close to lustre-1.8.9's numbers. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Peter Jones [ 13/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
Lai Could you please comment? Thanks Peter | ||||||||||||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 18/Sep/14 ] | ||||||||||||||||||||||||||||||||||||||||
|
This looks to be caused by statahead, because for mdtest stride option, statahead won't help, and cause overhead. And in current statahead implementation, each stat will try statahead, though it will fail because the stat entry is not first directory entry, which will cause more overhead. Hopefully Could you disable statahead on master client, and run this test again? In the mean time, I'll do this test against |