[LU-2442] metadata performance degradation on current master Created: 06/Dec/12  Updated: 17/Sep/13  Resolved: 17/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0, Lustre 2.5.0

Type: Bug Priority: Major
Reporter: Di Wang Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File mdtest_create.png     PNG File mdtest_unlink.png    
Issue Links:
Related
is related to LU-3966 Submit quota lock improvement patches... Resolved
is related to LU-20 patchless server kernel Resolved
Severity: 3
Rank (Obsolete): 5771

 Description   

During Minh's performance test on Opensfs cluster, we found a quite performance degradation.

b2_3 test result

[root@c24 bin]# ./run_mdsrate_create
0: c01 starting at Thu Dec  6 08:48:39 2012
Rate: 26902.88 eff 26902.57 aggr 140.12 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 142.74 secs)
0: c01 finished at Thu Dec  6 08:51:02 2012
[root@c24 bin]# ./run_mdsrate_stat
0: c01 starting at Thu Dec  6 08:51:50 2012
Rate: 169702.53 eff 169703.11 aggr 883.87 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 22.63 secs)
0: c01 finished at Thu Dec  6 08:52:13 2012
[root@c24 bin]# ./run_mdsrate_unlink
0: c01 starting at Thu Dec  6 08:52:28 2012
Rate: 33486.06 eff 33486.74 aggr 174.41 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 114.67 secs)
Warning: only unlinked 3840000 files instead of 20000
0: c01 finished at Thu Dec  6 08:54:23 2012
[root@c24 bin]# ./run_mdsrate_mknod
0: c01 starting at Thu Dec  6 08:54:32 2012
Rate: 52746.29 eff 52745.00 aggr 274.71 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 72.80 secs)
0: c01 finished at Thu Dec  6 08:55:45 2012
[root@c24 bin]#

Master test result

[root@c24 bin]# ./run_mdsrate_create
0: c01 starting at Tue Dec  4 21:15:40 2012
Rate: 6031.09 eff 6031.11 aggr 31.41 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 636.70 secs)
0: c01 finished at Tue Dec  4 21:26:17 2012
[root@c24 bin]# ./run_mdsrate_stat
0: c01 starting at Tue Dec  4 21:27:04 2012
Rate: 177962.00 eff 177964.59 aggr 926.90 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 21.58 secs)
0: c01 finished at Tue Dec  4 21:27:26 2012
[root@c24 bin]# ./run_mdsrate_unlink
0: c01 starting at Tue Dec  4 21:29:47 2012
Rate: 8076.06 eff 8076.08 aggr 42.06 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 475.48 secs)
Warning: only unlinked 3840000 files instead of 20000
0: c01 finished at Tue Dec  4 21:37:43 2012
[root@c24 bin]# ./run_mdsrate_mknod
0: c01 starting at Tue Dec  4 21:48:41 2012
Rate: 10430.50 eff 10430.61 aggr 54.33 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 368.15 secs)
0: c01 finished at Tue Dec  4 21:54:49 2012


 Comments   
Comment by Cliff White (Inactive) [ 06/Dec/12 ]

Comparing mdtest runs on Hyperion, believe i can confirm this:
2.1.3 - mdtestssf

0000: SUMMARY: (of 5 iterations)
0000:    Operation                  Max        Min       Mean    Std Dev
0000:    ---------                  ---        ---       ----    -------
0000:    Directory creation:   9158.250   6805.626   7521.172    851.288
0000:    Directory stat    :  40859.559  40283.053  40503.987    203.103
0000:    Directory removal :   5586.564   4990.031   5274.173    192.167
0000:    File creation     :  12461.089   6539.534   9676.354   1981.131
0000:    File stat         :  40967.623  39510.833  40196.762    550.350
0000:    File removal      :   7316.976   5786.912   6623.610    617.150
0000:    Tree creation     :     19.604     10.008     12.621      3.562
0000:    Tree removal      :     15.235     10.994     12.947      1.597

2.3.54

0000: SUMMARY: (of 5 iterations)
0000:    Operation                  Max        Min       Mean    Std Dev
0000:    ---------                  ---        ---       ----    -------
0000:    Directory creation:   4853.701   4014.951   4332.333    295.957
0000:    Directory stat    :  39866.927  39646.375  39773.126     84.119
0000:    Directory removal :   3499.158   3326.216   3411.107     65.731
0000:    File creation     :   2809.412   2529.216   2661.200    109.115
0000:    File stat         :  40010.001  39748.993  39875.258     83.955
0000:    File removal      :   2246.567   1899.579   2091.405    129.200
0000:    Tree creation     :     14.167      9.096     11.248      1.712
0000:    Tree removal      :     12.821     10.101     11.436      1.100

2.1.3 mdtestfpp

0000: SUMMARY: (of 5 iterations)
0000:    Operation                  Max        Min       Mean    Std Dev
0000:    ---------                  ---        ---       ----    -------
0000:    Directory creation:   9290.675   7180.471   8048.526    723.684
0000:    Directory stat    :  47632.129  43813.325  46277.895   1581.071
0000:    Directory removal :  13615.528  10414.726  12001.418   1187.520
0000:    File creation     :  12655.076   9627.218  11585.362   1067.008
0000:    File stat         :  45895.125  44328.965  44772.921    584.435
0000:    File removal      :  29147.930  18837.845  23098.132   4096.757
0000:    Tree creation     :     11.175      8.051      9.766      1.080
0000:    Tree removal      :      7.407      5.315      6.225      0.675

2.3.54

0000: SUMMARY: (of 5 iterations)
0000:    Operation                  Max        Min       Mean    Std Dev
0000:    ---------                  ---        ---       ----    -------
0000:    Directory creation:   4324.853   3586.223   3915.558    236.782
0000:    Directory stat    :  64787.251  64169.168  64501.553    207.900
0000:    Directory removal :   3712.552   3447.687   3537.772    104.057
0000:    File creation     :   2571.134   2251.887   2362.125    112.613
0000:    File stat         :  64940.933  64515.646  64764.334    156.090
0000:    File removal      :   3127.609   2585.040   2866.863    182.853
0000:    Tree creation     :      3.516      2.733      3.036      0.266
0000:    Tree removal      :      2.123      1.973      2.068      0.056
Comment by Peter Jones [ 10/Dec/12 ]

Lai will work on this one

Comment by Nathan Rutman [ 12/Dec/12 ]

Cliff, how many clients do you use for these tests?

Comment by Cliff White (Inactive) [ 12/Dec/12 ]

I believe these tests used 64 clients.

Comment by Nathan Rutman [ 12/Dec/12 ]

Single mount point each or multiple mounts?

Comment by Cliff White (Inactive) [ 12/Dec/12 ]

Single mount point

Comment by Alex Zhuravlev [ 13/Dec/12 ]

first of all, it makes sense to benchmark 2.3 as well. I'd rather expect regression to be introduces in orion, but we still need to make sure.

Comment by Andreas Dilger [ 13/Dec/12 ]

Alex, please see first comment - it has 2.3 vs master results.

Comment by Alex Zhuravlev [ 13/Dec/12 ]

oops, sorry..

Comment by Alex Zhuravlev [ 13/Dec/12 ]

seem to be related to quota being counted all the time:

2.3: mdt 1 file 100000 dir 2 thr 4 create 20750.33 [16998.93,22998.30]
2.4(default): mdt 1 file 100000 dir 2 thr 4 create 5990.44 [1999.70,6999.62]
2.4(-quota): mdt 1 file 100000 dir 2 thr 4 create 19567.30 [19995.64,21998.22]

Comment by Minh Diep [ 13/Dec/12 ]

Alex, I checked the quota was disable (ie quota_save/enabled is none). How did you get the "-quota" case?

Comment by Alex Zhuravlev [ 13/Dec/12 ]

I rebuilt mkfs with the following patch:

— a/lustre/utils/mount_utils_ldiskfs.c
+++ b/lustre/utils/mount_utils_ldiskfs.c
@@ -438,6 +438,7 @@ static int enable_default_ext4_features(struct mkfs_opts *mop, char *anchor,
if (mop->mo_ldd.ldd_mount_type == LDD_MT_EXT3)
return 0;

+#if 0
/* Enable quota by default */
if (is_e2fsprogs_feature_supp("-O quota") == 0)

{ append_unique(anchor, ",", "quota", NULL, maxbuflen); @@ -447,6 +448,7 @@ static int enable_default_ext4_features(struct mkfs_opts *mop, char *anchor, "e2fsprogs, please upgrade your e2fsprogs.\n"); return EINVAL; }

+#endif

/* Allow files larger than 2TB. Also needs LU-16, but not harmful. */
if (is_e2fsprogs_feature_supp("-O huge_file") == 0)

but I guess you can turn quota accounting off with tune2fs -O ^quota <mds device>, but notice this is not a solution:
in the current design quota accounting is always enabled and I guess we have to fix accounting somehow.
I disabled quota to find the cause.

Comment by Lai Siyao [ 17/Dec/12 ]

Below is the result of mds-survey test with quota on and 0 stripe:

mdt 1 file  162470 dir    4 thr    4 create 73485.77 [73997.93,73997.93] lookup 628892.16 [628892.16,628892.16] md_getattr 441141.49 [441141.49,441141.49] setxattr 10206.61 [   0.00,21996.74] destroy 37075.41 [   0.00,31527.94]
mdt 1 file  162470 dir    4 thr    8 create 20421.68 [7999.22,27998.18] lookup 649627.33 [649627.33,649627.33] md_getattr 425714.11 [425714.11,425714.11] setxattr 8965.92 [   0.00,23993.11] destroy 18337.15 [   0.00,31997.98]

quota off and 0 stripe:

mdt 1 file   75846 dir    4 thr    4 create 96651.65 [96651.65,96651.65] lookup 612128.94 [612128.94,612128.94] md_getattr 333600.18 [333600.18,333600.18] setxattr 10669.48 [   0.00,20996.43] destroy 95098.86 [95098.86,95098.86]
mdt 1 file   75846 dir    4 thr    8 create 107707.84 [107707.84,107707.84] lookup 610706.69 [610706.69,610706.69] md_getattr 435877.10 [435877.10,435877.10] setxattr 9347.64 [   0.00,19997.84] destroy 83942.93 [83942.93,83942.93]

It shows when quota is off, md operations is much faster than quota on, especially with more threads, eg. create 107707.84 vs 20421.68 with 8 threads.

Comment by Alex Zhuravlev [ 17/Dec/12 ]

quota framework in vfs is totally serialized, iow it does not scale with number of threads/cores.
e.g. see dquot_commit() and dquot_alloc_inode()

Comment by Andreas Dilger [ 17/Dec/12 ]

If we can make a patch to the VFS for this, I suspect it would be possible to get accepted upstream? Alternately, is it possible to bypass some of these functions in our own code without having to re-implement the whole quota code?

Comment by Alex Zhuravlev [ 17/Dec/12 ]

right.. something to think about.

there are set of wrappers called by ext3, like vfs_dq_alloc_inode() and vfs_dq_init() which in turn call methods exported via ext3_quota_operations()
so, in theory we can replace generic methods (like dquot_initialize() and dquot_alloc_inode()) with our ones, probably even one by one to avoid huge
patches. but in any case this is not a 10-lines patch. i'm going to have a closer look soon.

Comment by Lai Siyao [ 17/Dec/12 ]

http://thread.gmane.org/gmane.linux.file-systems/47509 lists some quota SMP improvements, and some of them looks helpful here. I'll pick some to patch MDS kernel and test again.

Comment by Andreas Dilger [ 18/Dec/12 ]

Lai, first check if these patches were merged upstream, and if there were any fixes. If not, please contact the author if there are newer patches available, in case there have been improvements and/or bug fixes since then (the referenced patches are 2 years old).

Comment by Lai Siyao [ 18/Dec/12 ]

Okay, I'll do that. BTW http://download.openvz.org/~dmonakhov/quota.html is the web page for quota improvements.

Comment by Minh Diep [ 18/Dec/12 ]

I have tried to use tune2fs to disable quota on master but the create perf was still only 10% improve.

Comment by Alex Zhuravlev [ 18/Dec/12 ]

please tell what did you do exactly ?

Comment by Minh Diep [ 18/Dec/12 ]

I ran master on opensfs cluster with 1mdt and 24 clients. each client mount 8 mount points.
Create on default quota resulted in 6.1k/s
Then I used tune2fs -O ^quota /dev/sdc1
Create on disable quota resulted in 7k/s

Comment by Alex Zhuravlev [ 18/Dec/12 ]

Minh, let's start from local testing using mds-survey ?

Comment by Alex Zhuravlev [ 18/Dec/12 ]

also, did you remount mds after tune2fs ?

Comment by Minh Diep [ 18/Dec/12 ]

No, I did not remount. I will try and let you know

Comment by Lai Siyao [ 27/Dec/12 ]

I mailed to the author of the quota scalability patches, but he doesn't reply. And those patches are not merged to upstream kernel, and seem not maintained any more. The latest version is for 2.6.36-rc5: http://www.spinics.net/lists/linux-fsdevel/msg39408.html.

I've picked up some patches from the patch set, and the test result of mds-survey looks positive:

2.3

mdt 1 file  500000 dir    4 thr    4 create 40576.65 [   0.00,90994.27] destroy 40211.44 [   0.00,127981.83]
mdt 1 file  500000 dir    4 thr    8 create 38368.90 [   0.00,75992.70] destroy 40001.98 [   0.00,106996.26]

master

mdt 1 file  500000 dir    4 thr    4 create 37817.06 [   0.00,73995.19] destroy 37767.73 [   0.00,99993.80]
mdt 1 file  500000 dir    4 thr    8 create 18734.04 [   0.00,26998.95] destroy 19531.29 [   0.00,31997.89]

master with kernel quota disabled

mdt 1 file  500000 dir    4 thr    4 create 43897.17 [   0.00,96993.02] destroy 44340.73 [   0.00,118992.15]
mdt 1 file  500000 dir    4 thr    8 create 45157.98 [   0.00,104990.45] destroy 33863.89 [   0.00,99987.50]

master with quota scalability patches and enabled

mdt 1 file  500000 dir    4 thr    4 create 38428.90 [   0.00,86994.43] destroy 33402.49 [   0.00,114992.30]
mdt 1 file  500000 dir    4 thr    8 create 40242.71 [   0.00,92993.77] destroy 31620.07 [   0.00,89987.76]

I will review the quota scalability patches and tidy them up, and commit to review later. And at the same time I'll do some mdsrate test.

Comment by Alex Zhuravlev [ 27/Dec/12 ]

the numbers look nice.. though 37.8K creates/sec with clean master look unexpected. in previous testing creates were few times slower compared to 2.3 ?

Comment by Lai Siyao [ 27/Dec/12 ]

Last time I tested with /usr/lib64/lustre/tests/mds-survey.sh, and it has a small limit on file_count, so that the test result might not be quite accurate. This time I tested with /usr/bin/mds-survey directly, and tested with 500k files. And with 8 threads, create/unlink on master is still half the speed of 2.3.

Comment by Alex Zhuravlev [ 27/Dec/12 ]

could you clarify whether the test was unlinking 0-striping files ?

Comment by Lai Siyao [ 27/Dec/12 ]

Yes, the test script I used is `tests_str="create destroy" file_count=500000 thrlo=4 thrhi=8 mds-survey`, by default it's tested with 0-stripe. System is RHEL6, with 4-core cpu.

Comment by Alex Zhuravlev [ 27/Dec/12 ]

hmm, I see ~40K destroys with 2.3 and 19-37K with clean master (or 50-90% from 2.3) ? or you're referencing different data ?

Comment by Lai Siyao [ 27/Dec/12 ]

Yes, you're correct. Previously I said 1/3 is that I looked into wrong column. Both create and unlink is almost 50% speed of 2.3.

Comment by Alex Zhuravlev [ 27/Dec/12 ]

and with quota disabled we do creates better and unlinks worse a bit. though the result seem to vary quite. I'd suggest to make more runs.

Comment by Lai Siyao [ 28/Dec/12 ]

mds-survey results with 4 threads:

2.3

mdt 1 file  500000 dir    4 thr    4 create 30301.67 [   0.00,54993.07] destroy 45344.53 [   0.00,118982.39]
mdt 1 file  500000 dir    4 thr    4 create 33631.21 [   0.00,58998.17] destroy 41886.71 [   0.00,123991.69]
mdt 1 file  500000 dir    4 thr    4 create 33204.29 [   0.00,59996.46] destroy 41117.96 [   0.00,125990.17]
mdt 1 file  500000 dir    4 thr    4 create 27398.33 [   0.00,57995.88] destroy 50452.95 [   0.00,119982.12]
mdt 1 file  500000 dir    4 thr    4 create 31605.05 [   0.00,58996.17] destroy 40747.33 [   0.00,118973.83]

master

mdt 1 file  500000 dir    4 thr    4 create 28684.53 [   0.00,44998.52] destroy 36285.94 [   0.00,96994.37]
mdt 1 file  500000 dir    4 thr    4 create 33779.78 [   0.00,55997.87] destroy 38156.31 [   0.00,99993.90]
mdt 1 file  500000 dir    4 thr    4 create 33766.30 [   0.00,56998.12] destroy 39992.00 [   0.00,86993.56]
mdt 1 file  500000 dir    4 thr    4 create 33192.04 [   0.00,55994.18] destroy 37546.43 [   0.00,97994.22]
mdt 1 file  500000 dir    4 thr    4 create 30217.59 [   0.00,49998.45] destroy 33306.32 [   0.00,81994.83]

master w/o quota

mdt 1 file  500000 dir    4 thr    4 create 33550.14 [   0.00,58992.45] destroy 43972.85 [   0.00,101993.06]
mdt 1 file  500000 dir    4 thr    4 create 35128.38 [   0.00,62997.61] destroy 41103.29 [   0.00,124982.00]
mdt 1 file  500000 dir    4 thr    4 create 32508.59 [   0.00,59994.42] destroy 43666.99 [   0.00,98993.47]
mdt 1 file  500000 dir    4 thr    4 create 32928.35 [   0.00,54993.79] destroy 40680.26 [   0.00,116992.04]
mdt 1 file  500000 dir    4 thr    4 create 33850.76 [   0.00,58996.11] destroy 41746.76 [   0.00,87986.80]

master with quota scalability

mdt 1 file  500000 dir    4 thr    4 create 30076.80 [   0.00,51998.23] destroy 43427.17 [   0.00,116992.04]
mdt 1 file  500000 dir    4 thr    4 create 30709.46 [   0.00,53997.79] destroy 45144.05 [   0.00,112988.48]
mdt 1 file  500000 dir    4 thr    4 create 31619.42 [   0.00,52992.48] destroy 44672.52 [   0.00,107985.53]
mdt 1 file  500000 dir    4 thr    4 create 32046.06 [   0.00,51996.31] destroy 40546.26 [   0.00,115992.00]
mdt 1 file  500000 dir    4 thr    4 create 31657.57 [   0.00,57997.91] destroy 43638.64 [   0.00,114991.72]
Comment by Lai Siyao [ 28/Dec/12 ]

mds-survey results with 8 threads:

2.3

mdt 1 file  500000 dir    4 thr    8 create 36109.49 [   0.00,92994.33] destroy 37088.26 [   0.00,86986.52]
mdt 1 file  500000 dir    4 thr    8 create 36823.39 [   0.00,92993.58] destroy 30732.48 [   0.00,87984.95]
mdt 1 file  500000 dir    4 thr    8 create 36634.11 [   0.00,91994.20] destroy 35597.85 [   0.00,96996.99]
mdt 1 file  500000 dir    4 thr    8 create 36689.13 [   0.00,94994.11] destroy 28568.98 [   0.00,69987.82]
mdt 1 file  500000 dir    4 thr    8 create 36749.42 [   0.00,93990.41] destroy 28724.14 [   0.00,66984.93]

master

mdt 1 file  500000 dir    4 thr    8 create 18413.35 [   0.00,30997.21] destroy 22972.08 [   0.00,38997.47]
mdt 1 file  500000 dir    4 thr    8 create 19550.71 [   0.00,27999.08] destroy 19823.87 [   0.00,60995.18]
mdt 1 file  500000 dir    4 thr    8 create 18735.31 [   0.00,27998.18] destroy 19917.03 [   0.00,38995.87]
mdt 1 file  500000 dir    4 thr    8 create 18914.35 [   0.00,26998.27] destroy 21628.42 [   0.00,32997.89]
mdt 1 file  500000 dir    4 thr    8 create 19823.51 [   0.00,25998.34] destroy 19660.51 [   0.00,41988.24]

master w/o quota

mdt 1 file  500000 dir    4 thr    8 create 34226.00 [   0.00,98993.27] destroy 25597.07 [   0.00,89981.28]
mdt 1 file  500000 dir    4 thr    8 create 37657.63 [   0.00,91993.84] destroy 32871.15 [   0.00,64997.92]
mdt 1 file  500000 dir    4 thr    8 create 36370.87 [   0.00,64986.35] destroy 28918.50 [   0.00,74993.85]
mdt 1 file  500000 dir    4 thr    8 create 37004.56 [   0.00,89994.24] destroy 32085.39 [   0.00,67995.04]
mdt 1 file  500000 dir    4 thr    8 create 34277.48 [   0.00,82992.45] destroy 28381.06 [   0.00,62990.17]

master with quota scalability

mdt 1 file  500000 dir    4 thr    8 create 33076.15 [   0.00,92994.23] destroy 34047.86 [   0.00,74991.90]
mdt 1 file  500000 dir    4 thr    8 create 34758.91 [   0.00,83994.96] destroy 29910.64 [   0.00,96989.91]
mdt 1 file  500000 dir    4 thr    8 create 33244.73 [   0.00,91993.93] destroy 33343.84 [   0.00,99990.20]
mdt 1 file  500000 dir    4 thr    8 create 36468.85 [   0.00,86994.35] destroy 32234.66 [   0.00,91988.50]
mdt 1 file  500000 dir    4 thr    8 create 35831.91 [   0.00,91993.84] destroy 34748.24 [   0.00,85989.25]
Comment by Keith Mannthey (Inactive) [ 04/Jan/13 ]

From the "mds-survey results with 8 threads"

Is there a good explanation for why "master with quota scalability" is generally faster than "master w/o quota" when it comes to destroy?

Is there any information for other thread counts 1,2,16,32?

Comment by Lai Siyao [ 05/Jan/13 ]

The test result varies a bit, and mostly I saw "master w/o quota" a bit faster than "master with quota scalability" in destroy, though the listed ones looks slower.

I couldn't find a powerful test node, so that I didn't test bigger thread numbers because they shows a drop in create/destroy rate. And /usr/lib64/lustre/tests/mds-survey.sh set "thrlo" to the cpu number, as looks reasonable, so I skipped 1,2 thread numbers too.

To sum up, it makes sense to collect accurate metadata perf data on a fat node, but I don't have the resources and time to do that right now. And current test result can roughly reflect the performance differences for different code.

Comment by Johann Lombardi (Inactive) [ 07/Jan/13 ]

All patches combined represent a significant amount of changes. I tend to think that the most important patch is the one removing the dqptr_sem.

Comment by Lai Siyao [ 08/Jan/13 ]

Hmm, http://www.spinics.net/lists/linux-fsdevel/msg39427.html addresses this issue, and it's included in previous test. I'll try to test with it only.

Comment by Lai Siyao [ 13/Jan/13 ]

The patch mentioned above is ready, I'll conduct test later.

Comment by Andreas Dilger [ 14/Jan/13 ]

http://review.whamcloud.com/5010

Comment by Cory Spitz [ 21/Jan/13 ]

If the patch wasn't accepted upstream previously is there any hope of getting it landed now?

Comment by Andreas Dilger [ 08/Feb/13 ]

Cory, we can always make another attempt to push it upstream...

Comment by Lai Siyao [ 19/Feb/13 ]

Below is the test result for http://review.whamcloud.com/5010:

RHEL6, MDS with 24 cores, 24G men.

master:
[root@fat-intel-2 tests]# for i in `seq 1 5`; do sh llmount.sh >/dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh > /dev/null 2>&1; done
Fri Feb 15 21:41:30 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 26209.85 [ 0.00,37997.04] destroy 28515.96 [ 0.00,60988.72]
mdt 1 file 500000 dir 4 thr 8 create 18912.57 [ 0.00,27998.18] destroy 21772.84 [ 999.93,33996.46]
mdt 1 file 500000 dir 4 thr 16 create 18270.86 [ 0.00,27998.82] destroy 16040.28 [ 0.00,31997.92]
mdt 1 file 500000 dir 4 thr 32 create 16305.92 [ 0.00,31997.44] destroy 19228.34 [ 0.00,31998.98]
done!
Fri Feb 15 21:47:04 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 26205.63 [ 0.00,38996.06] destroy 29596.33 [ 0.00,72992.48]
mdt 1 file 500000 dir 4 thr 8 create 19632.23 [ 0.00,27997.84] destroy 20835.51 [ 0.00,39996.84]
mdt 1 file 500000 dir 4 thr 16 create 17744.74 [ 0.00,27997.90] destroy 21308.89 [ 0.00,33997.38]
mdt 1 file 500000 dir 4 thr 32 create 15610.01 [ 0.00,28998.09] destroy 11542.69 [ 0.00,31998.40]
done!
Fri Feb 15 21:52:45 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 24109.43 [ 0.00,37996.85] destroy 33815.68 [ 0.00,75990.35]
mdt 1 file 500000 dir 4 thr 8 create 20305.14 [ 0.00,35994.82] destroy 20964.45 [ 0.00,34997.10]
mdt 1 file 500000 dir 4 thr 16 create 17943.26 [ 0.00,27997.40] destroy 14417.99 [ 0.00,31998.88]
mdt 1 file 500000 dir 4 thr 32 create 13365.02 [ 0.00,31997.60] destroy 17411.86 [ 0.00,31998.43]
done!
Fri Feb 15 21:58:44 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 26433.77 [ 0.00,37996.12] destroy 29013.88 [ 0.00,66989.01]
mdt 1 file 500000 dir 4 thr 8 create 20786.59 [ 0.00,29999.25] destroy 15879.62 [ 0.00,33995.75]
mdt 1 file 500000 dir 4 thr 16 create 17596.20 [ 0.00,27997.90] destroy 19394.02 [ 0.00,31998.88]
mdt 1 file 500000 dir 4 thr 32 create 17086.35 [ 0.00,31998.88] destroy 14710.48 [ 0.00,31998.02]
done!
Fri Feb 15 22:04:26 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 23813.65 [ 0.00,36997.34] destroy 34439.20 [ 0.00,67991.09]
mdt 1 file 500000 dir 4 thr 8 create 18209.21 [ 0.00,27997.87] destroy 22394.02 [ 0.00,36996.67]
mdt 1 file 500000 dir 4 thr 16 create 18059.66 [ 0.00,26998.25] destroy 20458.29 [ 0.00,31998.82]
mdt 1 file 500000 dir 4 thr 32 create 13597.78 [ 0.00,31998.50] destroy 17167.29 [ 0.00,31999.01]

master w/o quota:

master w/o quota:
[root@fat-intel-2 tests]# for i in `seq 1 5`; do NOSETUP=y sh llmount.sh >/dev/null 2>&1; tune2fs -O ^quota /dev/sdb1 > /dev/null 2>&1; NOFORMAT=y sh llmount.sh > /dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh > /dev/null 2>&1; done
Fri Feb 15 22:20:09 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 30366.94 [ 0.00,47995.01] destroy 43404.01 [ 0.00,124985.75]
mdt 1 file 500000 dir 4 thr 8 create 41967.03 [ 0.00,127982.59] destroy 37145.02 [ 0.00,78991.23]
mdt 1 file 500000 dir 4 thr 16 create 40425.22 [ 0.00,119987.76] destroy 16860.10 [ 0.00,56991.45]
mdt 1 file 500000 dir 4 thr 32 create 31995.01 [ 0.00,103991.26] destroy 11203.98 [ 0.00,50994.14]
done!
Fri Feb 15 22:25:02 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 30934.22 [ 0.00,48997.84] destroy 43168.82 [ 0.00,126982.73]
mdt 1 file 500000 dir 4 thr 8 create 48911.42 [ 0.00,129987.78] destroy 19988.51 [ 0.00,49996.20]
mdt 1 file 500000 dir 4 thr 16 create 45156.92 [ 0.00,118988.46] destroy 18769.52 [ 0.00,74989.50]
mdt 1 file 500000 dir 4 thr 32 create 28270.60 [ 0.00,95992.80] destroy 18232.65 [ 0.00,94987.27]
done!
Fri Feb 15 22:29:41 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 30139.03 [ 0.00,48995.93] destroy 50610.31 [ 0.00,118986.08]
mdt 1 file 500000 dir 4 thr 8 create 50995.45 [ 0.00,131979.15] destroy 37575.37 [ 0.00,85987.79]
mdt 1 file 500000 dir 4 thr 16 create 52478.59 [ 0.00,98981.09] destroy 13508.81 [ 0.00,52991.36]
mdt 1 file 500000 dir 4 thr 32 create 26142.72 [ 0.00,95995.01] destroy 21777.04 [ 0.00,100996.57]
done!
Fri Feb 15 22:34:12 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 30747.88 [ 0.00,46996.01] destroy 44731.56 [ 0.00,115981.44]
mdt 1 file 500000 dir 4 thr 8 create 51109.70 [ 0.00,130978.52] destroy 34416.83 [ 0.00,84982.24]
mdt 1 file 500000 dir 4 thr 16 create 49890.86 [ 0.00,117985.96] destroy 14209.59 [ 0.00,56995.04]
mdt 1 file 500000 dir 4 thr 32 create 27091.59 [ 0.00,114989.65] destroy 43223.11 [ 0.00,104982.15]
done!
Fri Feb 15 22:38:37 PST 2013 /usr/bin/mds-survey from fat-intel-2
mdt 1 file 500000 dir 4 thr 4 create 31068.91 [ 0.00,49997.85] destroy 42854.34 [ 0.00,128981.30]
mdt 1 file 500000 dir 4 thr 8 create 46053.50 [ 0.00,131982.05] destroy 41475.42 [ 0.00,89990.91]
mdt 1 file 500000 dir 4 thr 16 create 45316.94 [ 0.00,118992.03] destroy 12968.13 [ 0.00,75987.61]
mdt 1 file 500000 dir 4 thr 32 create 30377.61 [ 0.00,122984.14] destroy 13852.39 [ 0.00,79989.60]

master w/ patched quota:

[root@fat-intel-2 tests]# for i in `seq 1 5`; do sh llmount.sh >/dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh > /dev/null 2>&1; done
Sun Feb 17 07:11:38 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 26283.65 [ 0.00,41997.40] destroy 45303.57 [ 0.00,92987.45]
mdt 1 file 500000 dir 4 thr 8 create 33285.82 [ 0.00,65991.95] destroy 35571.95 [ 0.00,77989.39]
mdt 1 file 500000 dir 4 thr 16 create 30549.32 [ 0.00,56995.78] destroy 27374.45 [ 0.00,68995.03]
mdt 1 file 500000 dir 4 thr 32 create 16176.84 [ 0.00,51996.62] destroy 14717.10 [ 0.00,57994.66]
done!
Sun Feb 17 07:16:26 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 26496.66 [ 0.00,40996.64] destroy 44832.99 [ 0.00,97986.09]
mdt 1 file 500000 dir 4 thr 8 create 33672.99 [ 0.00,66996.11] destroy 35580.20 [ 0.00,65993.99]
mdt 1 file 500000 dir 4 thr 16 create 31749.41 [ 0.00,55994.96] destroy 29304.10 [ 0.00,69994.82]
mdt 1 file 500000 dir 4 thr 32 create 22033.01 [ 0.00,56994.98] destroy 9419.67 [ 0.00,43995.07]
done!
Sun Feb 17 07:21:21 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 26988.79 [ 0.00,39998.72] destroy 43051.27 [ 0.00,100982.03]
mdt 1 file 500000 dir 4 thr 8 create 33265.27 [ 0.00,65992.15] destroy 34237.19 [ 0.00,78991.63]
mdt 1 file 500000 dir 4 thr 16 create 31354.47 [ 0.00,57996.23] destroy 26779.46 [ 0.00,66996.11]
mdt 1 file 500000 dir 4 thr 32 create 17949.45 [ 0.00,59996.04] destroy 12178.20 [ 0.00,62994.96]
done!
Sun Feb 17 07:26:15 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 27104.78 [ 0.00,39998.76] destroy 42821.07 [ 0.00,96990.49]
mdt 1 file 500000 dir 4 thr 8 create 33137.66 [ 0.00,64996.04] destroy 33041.39 [ 0.00,76992.76]
mdt 1 file 500000 dir 4 thr 16 create 30752.19 [ 0.00,56996.30] destroy 27265.47 [ 0.00,67994.63]
mdt 1 file 500000 dir 4 thr 32 create 15830.85 [ 0.00,60996.71] destroy 10042.26 [ 0.00,47993.38]
done!
Sun Feb 17 07:31:22 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 27513.04 [ 0.00,40998.57] destroy 39405.62 [ 0.00,97989.32]
mdt 1 file 500000 dir 4 thr 8 create 34868.78 [ 0.00,78996.13] destroy 40266.84 [ 0.00,73996.52]
mdt 1 file 500000 dir 4 thr 16 create 28035.39 [ 0.00,58996.05] destroy 18330.08 [ 0.00,68995.86]
mdt 1 file 500000 dir 4 thr 32 create 22727.89 [ 0.00,56996.58] destroy 10760.47 [ 0.00,41993.53]

2.3:

[root@fat-intel-2 tests]# for i in `seq 1 5`; do sh llmount.sh >/dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh >/dev/null 2>&1; done
Mon Feb 18 18:37:29 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 26145.12 [ 0.00,47997.65] destroy 53906.35 [ 0.00,108985.07]
mdt 1 file 500000 dir 4 thr 8 create 51723.17 [ 0.00,125977.07] destroy 30210.63 [ 0.00,86986.95]
mdt 1 file 500000 dir 4 thr 16 create 47965.56 [ 0.00,117987.02] destroy 15171.14 [ 0.00,70989.64]
mdt 1 file 500000 dir 4 thr 32 create 27712.65 [ 0.00,95991.46] destroy 25103.89 [ 0.00,94985.37]
done!
Mon Feb 18 18:42:00 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 31616.35 [ 0.00,47994.00] destroy 46495.53 [ 0.00,114988.27]
mdt 1 file 500000 dir 4 thr 8 create 49424.94 [ 0.00,128977.94] destroy 28846.65 [ 0.00,69988.31]
mdt 1 file 500000 dir 4 thr 16 create 44127.39 [ 0.00,103993.14] destroy 13484.81 [ 0.00,69987.82]
mdt 1 file 500000 dir 4 thr 32 create 27848.28 [ 0.00,97995.39] destroy 27959.58 [ 0.00,96993.40]
done!
Mon Feb 18 18:46:33 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 31294.53 [ 0.00,47996.83] destroy 46002.83 [ 0.00,124981.75]
mdt 1 file 500000 dir 4 thr 8 create 49600.28 [ 0.00,127980.80] destroy 39120.42 [ 0.00,83984.29]
mdt 1 file 500000 dir 4 thr 16 create 41215.63 [ 0.00,115980.86] destroy 14951.03 [ 0.00,55992.05]
mdt 1 file 500000 dir 4 thr 32 create 29834.72 [ 0.00,95986.66] destroy 47968.23 [ 0.00,105982.51]
done!
Mon Feb 18 18:50:56 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 30903.29 [ 0.00,47995.97] destroy 44842.92 [ 0.00,120987.18]
mdt 1 file 500000 dir 4 thr 8 create 49633.18 [ 0.00,127962.12] destroy 44138.19 [ 0.00,87990.15]
mdt 1 file 500000 dir 4 thr 16 create 48650.44 [ 0.00,116986.08] destroy 21194.31 [ 0.00,83987.49]
mdt 1 file 500000 dir 4 thr 32 create 27149.35 [ 0.00,90995.91] destroy 22780.16 [ 0.00,102984.66]
done!
Mon Feb 18 18:55:13 PST 2013 /usr/bin/mds-survey from fat-intel-2.lab.whamcloud.com
mdt 1 file 500000 dir 4 thr 4 create 30643.97 [ 0.00,46996.29] destroy 44520.51 [ 0.00,115985.04]
mdt 1 file 500000 dir 4 thr 8 create 47911.39 [ 0.00,125981.98] destroy 32561.59 [ 0.00,79991.20]
mdt 1 file 500000 dir 4 thr 16 create 43045.28 [ 0.00,83986.48] destroy 17939.69 [ 0.00,56991.62]
mdt 1 file 500000 dir 4 thr 32 create 26503.95 [ 0.00,95985.41] destroy 31324.31 [ 0.00,97992.16]
done!

Comment by James A Simmons [ 22/Feb/13 ]

Does this patch need to be applied to the client's kernels? If so the patch needs to be updated for other platforms as well.

Comment by Johann Lombardi (Inactive) [ 22/Feb/13 ]

No, it is a server-side patch.

Comment by Lai Siyao [ 22/Mar/13 ]

I've finished performance test against ext4, the test machine has 24 cores, 24G men.

Mount option "usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0" is used when testing with quota.

Below is test script:

total=800000
thrno=1
THRMAX=32

while [ $thrno -le $THRMAX ]; do
	count=$(($total/$thrno))
	mpirun -np $thrno -machinefile /tmp/machinefile mdtest -d /mnt/temp/d1 -i 1 -n $count -u -F >> /tmp/md.txt 2>&1
	thrno=$((2*$thrno))
done

Mdtest create test result (Column is the parallel thread count):

	quota	        patched quota	w/o quota
1	36614.457	36661.038	41179.899
2	35046.13	34979.013	47455.064
4	35046.13	50748.669	56157.671
8	28781.597	39255.844	46426.061
16	11251.192	28218.439	50534.734
32	7880.249	31173.627	46366.125

Mdtest unlink test result:

	quota	        patched quota	w/o quota
1	29762.146	29307.613	30245.654
2	21131.769	26865.454	27094.563
4	13891.783	16076.384	17079.314
8	14099.972	12393.909	11943.64
16	6662.111	4812.819	6265.979
32	12735.206	4297.173	5210.164

The create test result is as expected, but unlink not. We can see when there are more test threads, test with quota achieves best result, even than w/o quota.

The result of oprofile of mdtest unlink is as below:
quota:

samples  %        image name               app name                 symbol name
73245     5.0417  vmlinux                  vmlinux                  intel_idle
68195     4.6941  vmlinux                  vmlinux                  __hrtimer_start_range_ns
44222     3.0439  vmlinux                  vmlinux                  schedule
26741     1.8407  vmlinux                  vmlinux                  mutex_spin_on_owner
26280     1.8089  vmlinux                  vmlinux                  __find_get_block
23680     1.6300  vmlinux                  vmlinux                  rwsem_down_failed_common
22958     1.5803  ext4.ko                  ext4.ko                  ext4_mark_iloc_dirty
20814     1.4327  vmlinux                  vmlinux                  update_curr
19519     1.3436  vmlinux                  vmlinux                  rb_erase
18140     1.2486  vmlinux                  vmlinux                  thread_return

patched quota:

samples  %        image name               app name                 symbol name
3235409  50.1659  vmlinux                  vmlinux                  dqput
1140972  17.6911  vmlinux                  vmlinux                  dqget
347286    5.3848  vmlinux                  vmlinux                  mutex_spin_on_owner
278271    4.3147  vmlinux                  vmlinux                  dquot_mark_dquot_dirty
277685    4.3056  vmlinux                  vmlinux                  dquot_commit
51187     0.7937  vmlinux                  vmlinux                  intel_idle
38886     0.6029  vmlinux                  vmlinux                  __find_get_block
32483     0.5037  vmlinux                  vmlinux                  schedule
30017     0.4654  ext4.ko                  ext4.ko                  ext4_mark_iloc_dirty
29618     0.4592  jbd2.ko                  jbd2.ko                  jbd2_journal_add_journal_head
29483     0.4571  vmlinux                  vmlinux                  mutex_lock

w/o quota:

samples  %        image name               app name                 symbol name
173301    6.3691  vmlinux                  vmlinux                  schedule
150041    5.5142  vmlinux                  vmlinux                  __audit_syscall_exit
148352    5.4522  vmlinux                  vmlinux                  update_curr
110868    4.0746  libmpi.so.1.0.2          libmpi.so.1.0.2          /usr/lib64/openmpi/lib/libmpi.so.1.0.2
105145    3.8642  libc-2.12.so             libc-2.12.so             sched_yield
104872    3.8542  vmlinux                  vmlinux                  sys_sched_yield
99494     3.6566  mca_btl_sm.so            mca_btl_sm.so            /usr/lib64/openmpi/lib/openmpi/mca_btl_sm.so
92536     3.4008  vmlinux                  vmlinux                  mutex_spin_on_owner
85868     3.1558  vmlinux                  vmlinux                  system_call

I don't quite understand the cause yet, maybe removing dqptr_sem causes more process scheduling?

Comment by Lai Siyao [ 27/Mar/13 ]

For patched quota, `oprofile -d ...` shows dqput and dqget are contending on dq_list_lock, however to improve this, more code changes are needed.

Comment by Peter Jones [ 01/Apr/13 ]

Landed for 2.4

Comment by Andreas Dilger [ 17/Apr/13 ]

This patch was only included into the RHEL6.3 patch series, not RHEL6.4 or SLES11 SP2.

Comment by James A Simmons [ 23/Apr/13 ]

The question I have is do I included this fix with my LU-1812 patch for SLES11 SP2 kernel support or as a separate patch? It depends on if their are plans to land the LU-1812 patch.

Comment by Lai Siyao [ 23/Apr/13 ]

Andreas, this patch is against VFS code, and can support RHEL6.3/6.4 with the same set of code.

James, it will be great if you can help port to SLES11SP2 and FC18 kernels, and IMO you don't need to include this in LU-1812 patch, because this is a performance improvement patch which doesn't affect functionality.

Comment by Andreas Dilger [ 23/Apr/13 ]

My mistake. I thought there were different kernel series for RHEL 6.3 and 6.4, but this is only true for ldiskfs.

Comment by James A Simmons [ 24/Apr/13 ]

Started to look at this patch for both FC18 and SLES11 SP2. For FC18 server support we need several patches to make it build. Because of this I doubt it will make it in the 2.4 release. Now the SLES11 SP2 support works with master. What I was thinking is to break up the LU-1812 patch into the FC18 part and the SLES11SP2 code into this patch. I could add in this fix as well. Would you be okay with that?

Comment by James A Simmons [ 24/Apr/13 ]

Oh while we are fixing the kernel quota issues I like to add this into the patch as well.

http://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?id=c3ad83d9efdfe6a86efd44945a781f00c879b7b4

Comment by James A Simmons [ 25/Apr/13 ]

I have a patch for SLES11 SP2 which I'm testing right now. Once it passes I will push it to gerrit.

Comment by James A Simmons [ 25/Apr/13 ]

My patch is at http://review.whamcloud.com/6168.

Comment by James A Simmons [ 03/May/13 ]

The SLES11 SP[1,2] platforms have been fix. We should keep this ticket open until fc18 is addresses for 2.5. Also we really should push this patch upstream and have the distributions pick this up. I recommend linking this ticket to LU-20.

Comment by Peter Jones [ 03/May/13 ]

Good idea!

Comment by Andreas Dilger [ 17/Sep/13 ]

The performance regression was resolved for 2.5.0 by applying the quota-replace-dqptr-sem and quota-avoid-dqget-calls patches to the kernel. I'm closing this bug, and opened LU-3966 to track landing those patches into the upstream kernel, which is not strictly related to the 2.5.x release stream.

Generated at Sat Feb 10 01:25:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.