[LU-5663] mds-survey performance regress on master Created: 25/Sep/14  Updated: 19/Feb/15  Resolved: 19/Feb/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jinshan Xiong (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: File patch.diff    
Issue Links:
Related
is related to LU-5621 Performance regression in 2.6 branch ... Resolved
Severity: 3
Rank (Obsolete): 15874

 Description   

Recently I ran mds-survey performance tests on master and b2_5 and found significantly performance regressions.

The performance data are as follows:

b2_5:

test name create lookup getattr setxattr destroy
p3700_sc0_32t_32dir 157343 1298292 876289 93022 143316
p3700_sc0_64t_32dir 170724 1378018 914515 99558 138258
p3700_sc1_32t_32dir 49900 1279590 881454 98467 33632
p3700_sc1_64t_32dir 40075 1274074 901117 100437 35839

MASTER(pre-2.7)

p3700_sc0_32t_32dir 95035 1124743 124594 76011 53653
p3700_sc0_64t_32dir 40043 1069693 49133 56457 51161
p3700_sc1_32t_32dir 29693 1106520 120479 60920 37890
p3700_sc1_64t_32dir 26208 1165051 383138 59853 38974

PS: p3700 is the test name;
sc0 means mds-survey stripe count 0 test
32t means 32 threads
32dir means 32 directories.

Therefore p3700_sc1_32t_32dir refers to the test {create, lookup,getattr,setxattr,destroy} files with 1 stripe, and there are 32 threads doing the work against 32 different directories.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 25/Sep/14 ]

This issue may be related LU-5621

Comment by Di Wang [ 26/Sep/14 ]

This is not a fix, but just remove those redundant stuff in the create object patch. Jinshan could you please try whether this patch can make master get back to 2.5. Thanks!

Comment by Andreas Dilger [ 26/Sep/14 ]

You could try reverting http://review.whamcloud.com/10376 which is reducing the max transaction size by 1/2.

Comment by Jinshan Xiong (Inactive) [ 26/Sep/14 ]

Sorry I made a terrible mistake here. I reran the test but I didn't see any performance regression. I must have used different parameters when I was running the test against b2_5.

		                             Create	Lookup	Md_getattr	Setxattr	Destroy

b2_5:						
	p3700_sc0_32t_32dir	141313.71	1265631.61	887451.61	94046.43	131052.37
	p3700_sc0_64t_32dir	170025.57	1337869.1	892227.34	103872.16	138735.83
	p3700_sc1_32t_32dir	48335.6	        1253868.18	876506.64	97518.53	32697.35
	p3700_sc1_64t_32dir	39041.31	1257847.46	735008.57	101502.42	33928.42

Master:						
	p3700_sc0_32t_32dir	138848.11	1263207.89	865148.7	88467.54	129140.86
	p3700_sc0_64t_32dir	149196.48	1335493.91	875077.15	95105.77	129005.38
	p3700_sc1_32t_32dir	48971.7	        1237754.48	839588.74	97515.56	35833.2
	p3700_sc1_64t_32dir	39285.28	1297680.39	839741.44	94257.2	        32877.89

Here is the latest result. BTW, either reverting the patch or applying Di's patch didn't boost the performance.

Comment by Andreas Dilger [ 27/Sep/14 ]

Is it possible you formatted the device differently when you ran the master test for 2.7? Maybe specifying a smaller device size or similar? If the MDT is formatted with a smaller size it could change the filesystem parameters (e.g. inode ratio, journal size, etc).

Another possibility is that there was something else in the filesystem that caused it to run more slowly (e.g. files from some previous testing? Did you also format the OSTs identically for all of the tests?

Comment by Jinshan Xiong (Inactive) [ 29/Sep/14 ]

I reformatted OST every time for each test. Mostly likely I ran the pre-2.7 test with a smaller journal size.

Generated at Sat Feb 10 01:53:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.