[LU-5663] mds-survey performance regress on master - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.7.0
Labels:
None

Severity:
3
Rank (Obsolete):
15874

Description

Recently I ran mds-survey performance tests on master and b2_5 and found significantly performance regressions.

The performance data are as follows:

b2_5:

test name	create	lookup	getattr	setxattr	destroy
p3700_sc0_32t_32dir	157343	1298292	876289	93022	143316
p3700_sc0_64t_32dir	170724	1378018	914515	99558	138258
p3700_sc1_32t_32dir	49900	1279590	881454	98467	33632
p3700_sc1_64t_32dir	40075	1274074	901117	100437	35839

MASTER(pre-2.7)

p3700_sc0_32t_32dir	95035	1124743	124594	76011	53653
p3700_sc0_64t_32dir	40043	1069693	49133	56457	51161
p3700_sc1_32t_32dir	29693	1106520	120479	60920	37890
p3700_sc1_64t_32dir	26208	1165051	383138	59853	38974

PS: p3700 is the test name;
sc0 means mds-survey stripe count 0 test
32t means 32 threads
32dir means 32 directories.

Therefore p3700_sc1_32t_32dir refers to the test {create, lookup,getattr,setxattr,destroy} files with 1 stripe, and there are 32 threads doing the work against 32 different directories.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

patch.diff
6 kB
26/Sep/14 6:24 AM

Issue Links

is related to

LU-5621 Performance regression in 2.6 branch on file operations to shared directory

Resolved

Activity

[LU-5663] mds-survey performance regress on master

Jinshan Xiong (Inactive) added a comment - 29/Sep/14 5:10 AM

I reformatted OST every time for each test. Mostly likely I ran the pre-2.7 test with a smaller journal size.

Jinshan Xiong (Inactive) added a comment - 29/Sep/14 5:10 AM I reformatted OST every time for each test. Mostly likely I ran the pre-2.7 test with a smaller journal size.

Andreas Dilger added a comment - 27/Sep/14 6:03 AM

Is it possible you formatted the device differently when you ran the master test for 2.7? Maybe specifying a smaller device size or similar? If the MDT is formatted with a smaller size it could change the filesystem parameters (e.g. inode ratio, journal size, etc).

Another possibility is that there was something else in the filesystem that caused it to run more slowly (e.g. files from some previous testing? Did you also format the OSTs identically for all of the tests?

Andreas Dilger added a comment - 27/Sep/14 6:03 AM Is it possible you formatted the device differently when you ran the master test for 2.7? Maybe specifying a smaller device size or similar? If the MDT is formatted with a smaller size it could change the filesystem parameters (e.g. inode ratio, journal size, etc). Another possibility is that there was something else in the filesystem that caused it to run more slowly (e.g. files from some previous testing? Did you also format the OSTs identically for all of the tests?

Jinshan Xiong (Inactive) added a comment - 26/Sep/14 11:46 PM - edited

Sorry I made a terrible mistake here. I reran the test but I didn't see any performance regression. I must have used different parameters when I was running the test against b2_5.

		                             Create	Lookup	Md_getattr	Setxattr	Destroy

b2_5:						
	p3700_sc0_32t_32dir	141313.71	1265631.61	887451.61	94046.43	131052.37
	p3700_sc0_64t_32dir	170025.57	1337869.1	892227.34	103872.16	138735.83
	p3700_sc1_32t_32dir	48335.6	        1253868.18	876506.64	97518.53	32697.35
	p3700_sc1_64t_32dir	39041.31	1257847.46	735008.57	101502.42	33928.42

Master:						
	p3700_sc0_32t_32dir	138848.11	1263207.89	865148.7	88467.54	129140.86
	p3700_sc0_64t_32dir	149196.48	1335493.91	875077.15	95105.77	129005.38
	p3700_sc1_32t_32dir	48971.7	        1237754.48	839588.74	97515.56	35833.2
	p3700_sc1_64t_32dir	39285.28	1297680.39	839741.44	94257.2	        32877.89

Here is the latest result. BTW, either reverting the patch or applying Di's patch didn't boost the performance.

Jinshan Xiong (Inactive) added a comment - 26/Sep/14 11:46 PM - edited Sorry I made a terrible mistake here. I reran the test but I didn't see any performance regression. I must have used different parameters when I was running the test against b2_5. Create Lookup Md_getattr Setxattr Destroy b2_5: p3700_sc0_32t_32dir 141313.71 1265631.61 887451.61 94046.43 131052.37 p3700_sc0_64t_32dir 170025.57 1337869.1 892227.34 103872.16 138735.83 p3700_sc1_32t_32dir 48335.6 1253868.18 876506.64 97518.53 32697.35 p3700_sc1_64t_32dir 39041.31 1257847.46 735008.57 101502.42 33928.42 Master: p3700_sc0_32t_32dir 138848.11 1263207.89 865148.7 88467.54 129140.86 p3700_sc0_64t_32dir 149196.48 1335493.91 875077.15 95105.77 129005.38 p3700_sc1_32t_32dir 48971.7 1237754.48 839588.74 97515.56 35833.2 p3700_sc1_64t_32dir 39285.28 1297680.39 839741.44 94257.2 32877.89 Here is the latest result. BTW, either reverting the patch or applying Di's patch didn't boost the performance.

Andreas Dilger added a comment - 26/Sep/14 5:47 PM

You could try reverting http://review.whamcloud.com/10376 which is reducing the max transaction size by 1/2.

Andreas Dilger added a comment - 26/Sep/14 5:47 PM You could try reverting http://review.whamcloud.com/10376 which is reducing the max transaction size by 1/2.

Di Wang added a comment - 26/Sep/14 6:24 AM

This is not a fix, but just remove those redundant stuff in the create object patch. Jinshan could you please try whether this patch can make master get back to 2.5. Thanks!

Di Wang added a comment - 26/Sep/14 6:24 AM This is not a fix, but just remove those redundant stuff in the create object patch. Jinshan could you please try whether this patch can make master get back to 2.5. Thanks!

Jinshan Xiong (Inactive) added a comment - 25/Sep/14 1:48 AM

This issue may be related ~~LU-5621~~

Jinshan Xiong (Inactive) added a comment - 25/Sep/14 1:48 AM This issue may be related LU-5621

mds-survey performance regress on master

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates