[LU-5621] Performance regression in 2.6 branch on file operations to shared directory Created: 15/Sep/14  Updated: 21/Mar/18  Resolved: 21/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Shuichi Ihara (Inactive) Assignee: Peter Jones
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

lustre-2.6.52


Issue Links:
Related
is related to LU-5663 mds-survey performance regress on master Closed
Severity: 2
Epic: metadata, performance
Rank (Obsolete): 15726

 Description   

We found a performance regression in master branch on our metadata scalability and regression testing.
Lustre-2.6 server is more than 50% slower against lustre-2.5 server on file creation/removal to shared directory. Here is test results.

mdtest -n $((1280000/NP)) -i 3 -p 10 -d /lustre_{0-$NP}/mdtest.out -F -C -r
File creation(Shared directory) 2.5 server 2.6 server
32 threads 69035 33354
64 threads 80798 30619
128 threads 69896 29956
256 threads 64860 29951
512 threads 70733 30070
1024 threads 73064 30540
File removal(Shared directory) 2.5 server 2.6 server
32 threads 61087 52609
64 threads 82981 46370
128 threads 72802 37697
256 threads 78388 35324
512 threads 81033 41058
1024 threads 79155 40544


 Comments   
Comment by Di Wang [ 15/Sep/14 ]

What is the client version? same for two tests?

Comment by Shuichi Ihara (Inactive) [ 15/Sep/14 ]

tested with 2.5 clients for both testing. I also have tested 2.6 client for 2.6 server testing(not 2.5 server test), but it was same results.

Comment by Jinshan Xiong (Inactive) [ 15/Sep/14 ]

John Hammond has ever submitted a patch about this but I don't know the current status. Please refresh my memory, John.

Comment by Di Wang [ 15/Sep/14 ]

Not sure it is a known issue or not. If it is not, it is worth to run mds-survey to see whether the problem is in LOD/OSD.

Comment by Shuichi Ihara (Inactive) [ 15/Sep/14 ]

OK, but "git bisect" would be easy way to find what exactly commit affects this regression. I will do it.

Comment by Peter Jones [ 16/Sep/14 ]

Once we see the results of the git bisect I'll work out who to assign this to

Comment by John Hammond [ 16/Sep/14 ]

Jinshan, do you mean the following?

commit 708d85a652a77f85153790e6cca1b7a2b91947cf
Author: John L. Hammond <john.hammond@intel.com>
Date:   Thu Jan 30 11:07:13 2014 -0600

    LU-4398 mdt: acquire an open lock for write or execute
    
    In mdt_object_open_lock() opens for write or execute will always
    acquire an open lock of the appropriate mode so that any conflicting
    cached open locks on other clients will be canceled. Add a regression
    test to sanityn.sh.
    
    Signed-off-by: John L. Hammond <john.hammond@intel.com>
    Change-Id: I8092bca4c418ec99a25584abdfb635ffec19a26e
    Reviewed-on: http://review.whamcloud.com/9063
    Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
    Tested-by: Jenkins
    Tested-by: Maloo <hpdd-maloo@intel.com>
    Reviewed-by: Mike Pershin <mike.pershin@intel.com>
    Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

That was landed after 2.5.0 and reverted before 2.6.0.

Comment by Jinshan Xiong (Inactive) [ 25/Sep/14 ]

This issue may be related to LU-5663.

Comment by Di Wang [ 26/Sep/14 ]

Ihara, could you please try the patch in LU-5663 to see whether it can improve the performance of your test. That patch eliminates the overhead of striped directory (brought in 2.6) in the create object patch, then I can see whether striped directory cause this performance regression here.

Comment by Di Wang [ 27/Sep/14 ]

Iharah, althogh this patch does not show any difference in Jinshan's test, it does not mean it would not help yours. Because his test is more about single (or few) thread on each directory. This temporary patch will probably be more helpful for shared directory. Thank you!

Comment by Di Wang [ 06/Oct/14 ]

http://review.whamcloud.com/#/c/12195 Not sure if this will help much, but worth to try.

Comment by Alex Zhuravlev [ 25/Oct/14 ]

is it possible to get profiling for the both versions?

Comment by Peter Jones [ 21/Mar/18 ]

I think that this issue is now out of date

Generated at Sat Feb 10 01:53:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.