[LU-2442] metadata performance degradation on current master Created: 06/Dec/12 Updated: 17/Sep/13 Resolved: 17/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Di Wang | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 5771 | ||||||||||||
| Description |
|
During Minh's performance test on Opensfs cluster, we found a quite performance degradation. b2_3 test result [root@c24 bin]# ./run_mdsrate_create 0: c01 starting at Thu Dec 6 08:48:39 2012 Rate: 26902.88 eff 26902.57 aggr 140.12 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 142.74 secs) 0: c01 finished at Thu Dec 6 08:51:02 2012 [root@c24 bin]# ./run_mdsrate_stat 0: c01 starting at Thu Dec 6 08:51:50 2012 Rate: 169702.53 eff 169703.11 aggr 883.87 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 22.63 secs) 0: c01 finished at Thu Dec 6 08:52:13 2012 [root@c24 bin]# ./run_mdsrate_unlink 0: c01 starting at Thu Dec 6 08:52:28 2012 Rate: 33486.06 eff 33486.74 aggr 174.41 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 114.67 secs) Warning: only unlinked 3840000 files instead of 20000 0: c01 finished at Thu Dec 6 08:54:23 2012 [root@c24 bin]# ./run_mdsrate_mknod 0: c01 starting at Thu Dec 6 08:54:32 2012 Rate: 52746.29 eff 52745.00 aggr 274.71 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 72.80 secs) 0: c01 finished at Thu Dec 6 08:55:45 2012 [root@c24 bin]# Master test result [root@c24 bin]# ./run_mdsrate_create 0: c01 starting at Tue Dec 4 21:15:40 2012 Rate: 6031.09 eff 6031.11 aggr 31.41 avg client creates/sec (total: 192 threads 3840000 creates 192 dirs 1 threads/dir 636.70 secs) 0: c01 finished at Tue Dec 4 21:26:17 2012 [root@c24 bin]# ./run_mdsrate_stat 0: c01 starting at Tue Dec 4 21:27:04 2012 Rate: 177962.00 eff 177964.59 aggr 926.90 avg client stats/sec (total: 192 threads 3840000 stats 192 dirs 1 threads/dir 21.58 secs) 0: c01 finished at Tue Dec 4 21:27:26 2012 [root@c24 bin]# ./run_mdsrate_unlink 0: c01 starting at Tue Dec 4 21:29:47 2012 Rate: 8076.06 eff 8076.08 aggr 42.06 avg client unlinks/sec (total: 192 threads 3840000 unlinks 192 dirs 1 threads/dir 475.48 secs) Warning: only unlinked 3840000 files instead of 20000 0: c01 finished at Tue Dec 4 21:37:43 2012 [root@c24 bin]# ./run_mdsrate_mknod 0: c01 starting at Tue Dec 4 21:48:41 2012 Rate: 10430.50 eff 10430.61 aggr 54.33 avg client mknods/sec (total: 192 threads 3840000 mknods 192 dirs 1 threads/dir 368.15 secs) 0: c01 finished at Tue Dec 4 21:54:49 2012 |
| Comments |
| Comment by Cliff White (Inactive) [ 06/Dec/12 ] |
|
Comparing mdtest runs on Hyperion, believe i can confirm this: 0000: SUMMARY: (of 5 iterations) 0000: Operation Max Min Mean Std Dev 0000: --------- --- --- ---- ------- 0000: Directory creation: 9158.250 6805.626 7521.172 851.288 0000: Directory stat : 40859.559 40283.053 40503.987 203.103 0000: Directory removal : 5586.564 4990.031 5274.173 192.167 0000: File creation : 12461.089 6539.534 9676.354 1981.131 0000: File stat : 40967.623 39510.833 40196.762 550.350 0000: File removal : 7316.976 5786.912 6623.610 617.150 0000: Tree creation : 19.604 10.008 12.621 3.562 0000: Tree removal : 15.235 10.994 12.947 1.597 2.3.54 0000: SUMMARY: (of 5 iterations) 0000: Operation Max Min Mean Std Dev 0000: --------- --- --- ---- ------- 0000: Directory creation: 4853.701 4014.951 4332.333 295.957 0000: Directory stat : 39866.927 39646.375 39773.126 84.119 0000: Directory removal : 3499.158 3326.216 3411.107 65.731 0000: File creation : 2809.412 2529.216 2661.200 109.115 0000: File stat : 40010.001 39748.993 39875.258 83.955 0000: File removal : 2246.567 1899.579 2091.405 129.200 0000: Tree creation : 14.167 9.096 11.248 1.712 0000: Tree removal : 12.821 10.101 11.436 1.100 2.1.3 mdtestfpp 0000: SUMMARY: (of 5 iterations) 0000: Operation Max Min Mean Std Dev 0000: --------- --- --- ---- ------- 0000: Directory creation: 9290.675 7180.471 8048.526 723.684 0000: Directory stat : 47632.129 43813.325 46277.895 1581.071 0000: Directory removal : 13615.528 10414.726 12001.418 1187.520 0000: File creation : 12655.076 9627.218 11585.362 1067.008 0000: File stat : 45895.125 44328.965 44772.921 584.435 0000: File removal : 29147.930 18837.845 23098.132 4096.757 0000: Tree creation : 11.175 8.051 9.766 1.080 0000: Tree removal : 7.407 5.315 6.225 0.675 2.3.54 0000: SUMMARY: (of 5 iterations) 0000: Operation Max Min Mean Std Dev 0000: --------- --- --- ---- ------- 0000: Directory creation: 4324.853 3586.223 3915.558 236.782 0000: Directory stat : 64787.251 64169.168 64501.553 207.900 0000: Directory removal : 3712.552 3447.687 3537.772 104.057 0000: File creation : 2571.134 2251.887 2362.125 112.613 0000: File stat : 64940.933 64515.646 64764.334 156.090 0000: File removal : 3127.609 2585.040 2866.863 182.853 0000: Tree creation : 3.516 2.733 3.036 0.266 0000: Tree removal : 2.123 1.973 2.068 0.056 |
| Comment by Peter Jones [ 10/Dec/12 ] |
|
Lai will work on this one |
| Comment by Nathan Rutman [ 12/Dec/12 ] |
|
Cliff, how many clients do you use for these tests? |
| Comment by Cliff White (Inactive) [ 12/Dec/12 ] |
|
I believe these tests used 64 clients. |
| Comment by Nathan Rutman [ 12/Dec/12 ] |
|
Single mount point each or multiple mounts? |
| Comment by Cliff White (Inactive) [ 12/Dec/12 ] |
|
Single mount point |
| Comment by Alex Zhuravlev [ 13/Dec/12 ] |
|
first of all, it makes sense to benchmark 2.3 as well. I'd rather expect regression to be introduces in orion, but we still need to make sure. |
| Comment by Andreas Dilger [ 13/Dec/12 ] |
|
Alex, please see first comment - it has 2.3 vs master results. |
| Comment by Alex Zhuravlev [ 13/Dec/12 ] |
|
oops, sorry.. |
| Comment by Alex Zhuravlev [ 13/Dec/12 ] |
|
seem to be related to quota being counted all the time: 2.3: mdt 1 file 100000 dir 2 thr 4 create 20750.33 [16998.93,22998.30] |
| Comment by Minh Diep [ 13/Dec/12 ] |
|
Alex, I checked the quota was disable (ie quota_save/enabled is none). How did you get the "-quota" case? |
| Comment by Alex Zhuravlev [ 13/Dec/12 ] |
|
I rebuilt mkfs with the following patch: — a/lustre/utils/mount_utils_ldiskfs.c +#if 0 +#endif /* Allow files larger than 2TB. Also needs but I guess you can turn quota accounting off with tune2fs -O ^quota <mds device>, but notice this is not a solution: |
| Comment by Lai Siyao [ 17/Dec/12 ] |
|
Below is the result of mds-survey test with quota on and 0 stripe: mdt 1 file 162470 dir 4 thr 4 create 73485.77 [73997.93,73997.93] lookup 628892.16 [628892.16,628892.16] md_getattr 441141.49 [441141.49,441141.49] setxattr 10206.61 [ 0.00,21996.74] destroy 37075.41 [ 0.00,31527.94] mdt 1 file 162470 dir 4 thr 8 create 20421.68 [7999.22,27998.18] lookup 649627.33 [649627.33,649627.33] md_getattr 425714.11 [425714.11,425714.11] setxattr 8965.92 [ 0.00,23993.11] destroy 18337.15 [ 0.00,31997.98] quota off and 0 stripe: mdt 1 file 75846 dir 4 thr 4 create 96651.65 [96651.65,96651.65] lookup 612128.94 [612128.94,612128.94] md_getattr 333600.18 [333600.18,333600.18] setxattr 10669.48 [ 0.00,20996.43] destroy 95098.86 [95098.86,95098.86] mdt 1 file 75846 dir 4 thr 8 create 107707.84 [107707.84,107707.84] lookup 610706.69 [610706.69,610706.69] md_getattr 435877.10 [435877.10,435877.10] setxattr 9347.64 [ 0.00,19997.84] destroy 83942.93 [83942.93,83942.93] It shows when quota is off, md operations is much faster than quota on, especially with more threads, eg. create 107707.84 vs 20421.68 with 8 threads. |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
quota framework in vfs is totally serialized, iow it does not scale with number of threads/cores. |
| Comment by Andreas Dilger [ 17/Dec/12 ] |
|
If we can make a patch to the VFS for this, I suspect it would be possible to get accepted upstream? Alternately, is it possible to bypass some of these functions in our own code without having to re-implement the whole quota code? |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
right.. something to think about. there are set of wrappers called by ext3, like vfs_dq_alloc_inode() and vfs_dq_init() which in turn call methods exported via ext3_quota_operations() |
| Comment by Lai Siyao [ 17/Dec/12 ] |
|
http://thread.gmane.org/gmane.linux.file-systems/47509 lists some quota SMP improvements, and some of them looks helpful here. I'll pick some to patch MDS kernel and test again. |
| Comment by Andreas Dilger [ 18/Dec/12 ] |
|
Lai, first check if these patches were merged upstream, and if there were any fixes. If not, please contact the author if there are newer patches available, in case there have been improvements and/or bug fixes since then (the referenced patches are 2 years old). |
| Comment by Lai Siyao [ 18/Dec/12 ] |
|
Okay, I'll do that. BTW http://download.openvz.org/~dmonakhov/quota.html is the web page for quota improvements. |
| Comment by Minh Diep [ 18/Dec/12 ] |
|
I have tried to use tune2fs to disable quota on master but the create perf was still only 10% improve. |
| Comment by Alex Zhuravlev [ 18/Dec/12 ] |
|
please tell what did you do exactly ? |
| Comment by Minh Diep [ 18/Dec/12 ] |
|
I ran master on opensfs cluster with 1mdt and 24 clients. each client mount 8 mount points. |
| Comment by Alex Zhuravlev [ 18/Dec/12 ] |
|
Minh, let's start from local testing using mds-survey ? |
| Comment by Alex Zhuravlev [ 18/Dec/12 ] |
|
also, did you remount mds after tune2fs ? |
| Comment by Minh Diep [ 18/Dec/12 ] |
|
No, I did not remount. I will try and let you know |
| Comment by Lai Siyao [ 27/Dec/12 ] |
|
I mailed to the author of the quota scalability patches, but he doesn't reply. And those patches are not merged to upstream kernel, and seem not maintained any more. The latest version is for 2.6.36-rc5: http://www.spinics.net/lists/linux-fsdevel/msg39408.html. I've picked up some patches from the patch set, and the test result of mds-survey looks positive: 2.3 mdt 1 file 500000 dir 4 thr 4 create 40576.65 [ 0.00,90994.27] destroy 40211.44 [ 0.00,127981.83] mdt 1 file 500000 dir 4 thr 8 create 38368.90 [ 0.00,75992.70] destroy 40001.98 [ 0.00,106996.26] master mdt 1 file 500000 dir 4 thr 4 create 37817.06 [ 0.00,73995.19] destroy 37767.73 [ 0.00,99993.80] mdt 1 file 500000 dir 4 thr 8 create 18734.04 [ 0.00,26998.95] destroy 19531.29 [ 0.00,31997.89] master with kernel quota disabled mdt 1 file 500000 dir 4 thr 4 create 43897.17 [ 0.00,96993.02] destroy 44340.73 [ 0.00,118992.15] mdt 1 file 500000 dir 4 thr 8 create 45157.98 [ 0.00,104990.45] destroy 33863.89 [ 0.00,99987.50] master with quota scalability patches and enabled mdt 1 file 500000 dir 4 thr 4 create 38428.90 [ 0.00,86994.43] destroy 33402.49 [ 0.00,114992.30] mdt 1 file 500000 dir 4 thr 8 create 40242.71 [ 0.00,92993.77] destroy 31620.07 [ 0.00,89987.76] I will review the quota scalability patches and tidy them up, and commit to review later. And at the same time I'll do some mdsrate test. |
| Comment by Alex Zhuravlev [ 27/Dec/12 ] |
|
the numbers look nice.. though 37.8K creates/sec with clean master look unexpected. in previous testing creates were few times slower compared to 2.3 ? |
| Comment by Lai Siyao [ 27/Dec/12 ] |
|
Last time I tested with /usr/lib64/lustre/tests/mds-survey.sh, and it has a small limit on file_count, so that the test result might not be quite accurate. This time I tested with /usr/bin/mds-survey directly, and tested with 500k files. And with 8 threads, create/unlink on master is still half the speed of 2.3. |
| Comment by Alex Zhuravlev [ 27/Dec/12 ] |
|
could you clarify whether the test was unlinking 0-striping files ? |
| Comment by Lai Siyao [ 27/Dec/12 ] |
|
Yes, the test script I used is `tests_str="create destroy" file_count=500000 thrlo=4 thrhi=8 mds-survey`, by default it's tested with 0-stripe. System is RHEL6, with 4-core cpu. |
| Comment by Alex Zhuravlev [ 27/Dec/12 ] |
|
hmm, I see ~40K destroys with 2.3 and 19-37K with clean master (or 50-90% from 2.3) ? or you're referencing different data ? |
| Comment by Lai Siyao [ 27/Dec/12 ] |
|
Yes, you're correct. Previously I said 1/3 is that I looked into wrong column. Both create and unlink is almost 50% speed of 2.3. |
| Comment by Alex Zhuravlev [ 27/Dec/12 ] |
|
and with quota disabled we do creates better and unlinks worse a bit. though the result seem to vary quite. I'd suggest to make more runs. |
| Comment by Lai Siyao [ 28/Dec/12 ] |
|
mds-survey results with 4 threads: 2.3 mdt 1 file 500000 dir 4 thr 4 create 30301.67 [ 0.00,54993.07] destroy 45344.53 [ 0.00,118982.39] mdt 1 file 500000 dir 4 thr 4 create 33631.21 [ 0.00,58998.17] destroy 41886.71 [ 0.00,123991.69] mdt 1 file 500000 dir 4 thr 4 create 33204.29 [ 0.00,59996.46] destroy 41117.96 [ 0.00,125990.17] mdt 1 file 500000 dir 4 thr 4 create 27398.33 [ 0.00,57995.88] destroy 50452.95 [ 0.00,119982.12] mdt 1 file 500000 dir 4 thr 4 create 31605.05 [ 0.00,58996.17] destroy 40747.33 [ 0.00,118973.83] master mdt 1 file 500000 dir 4 thr 4 create 28684.53 [ 0.00,44998.52] destroy 36285.94 [ 0.00,96994.37] mdt 1 file 500000 dir 4 thr 4 create 33779.78 [ 0.00,55997.87] destroy 38156.31 [ 0.00,99993.90] mdt 1 file 500000 dir 4 thr 4 create 33766.30 [ 0.00,56998.12] destroy 39992.00 [ 0.00,86993.56] mdt 1 file 500000 dir 4 thr 4 create 33192.04 [ 0.00,55994.18] destroy 37546.43 [ 0.00,97994.22] mdt 1 file 500000 dir 4 thr 4 create 30217.59 [ 0.00,49998.45] destroy 33306.32 [ 0.00,81994.83] master w/o quota mdt 1 file 500000 dir 4 thr 4 create 33550.14 [ 0.00,58992.45] destroy 43972.85 [ 0.00,101993.06] mdt 1 file 500000 dir 4 thr 4 create 35128.38 [ 0.00,62997.61] destroy 41103.29 [ 0.00,124982.00] mdt 1 file 500000 dir 4 thr 4 create 32508.59 [ 0.00,59994.42] destroy 43666.99 [ 0.00,98993.47] mdt 1 file 500000 dir 4 thr 4 create 32928.35 [ 0.00,54993.79] destroy 40680.26 [ 0.00,116992.04] mdt 1 file 500000 dir 4 thr 4 create 33850.76 [ 0.00,58996.11] destroy 41746.76 [ 0.00,87986.80] master with quota scalability mdt 1 file 500000 dir 4 thr 4 create 30076.80 [ 0.00,51998.23] destroy 43427.17 [ 0.00,116992.04] mdt 1 file 500000 dir 4 thr 4 create 30709.46 [ 0.00,53997.79] destroy 45144.05 [ 0.00,112988.48] mdt 1 file 500000 dir 4 thr 4 create 31619.42 [ 0.00,52992.48] destroy 44672.52 [ 0.00,107985.53] mdt 1 file 500000 dir 4 thr 4 create 32046.06 [ 0.00,51996.31] destroy 40546.26 [ 0.00,115992.00] mdt 1 file 500000 dir 4 thr 4 create 31657.57 [ 0.00,57997.91] destroy 43638.64 [ 0.00,114991.72] |
| Comment by Lai Siyao [ 28/Dec/12 ] |
|
mds-survey results with 8 threads: 2.3 mdt 1 file 500000 dir 4 thr 8 create 36109.49 [ 0.00,92994.33] destroy 37088.26 [ 0.00,86986.52] mdt 1 file 500000 dir 4 thr 8 create 36823.39 [ 0.00,92993.58] destroy 30732.48 [ 0.00,87984.95] mdt 1 file 500000 dir 4 thr 8 create 36634.11 [ 0.00,91994.20] destroy 35597.85 [ 0.00,96996.99] mdt 1 file 500000 dir 4 thr 8 create 36689.13 [ 0.00,94994.11] destroy 28568.98 [ 0.00,69987.82] mdt 1 file 500000 dir 4 thr 8 create 36749.42 [ 0.00,93990.41] destroy 28724.14 [ 0.00,66984.93] master mdt 1 file 500000 dir 4 thr 8 create 18413.35 [ 0.00,30997.21] destroy 22972.08 [ 0.00,38997.47] mdt 1 file 500000 dir 4 thr 8 create 19550.71 [ 0.00,27999.08] destroy 19823.87 [ 0.00,60995.18] mdt 1 file 500000 dir 4 thr 8 create 18735.31 [ 0.00,27998.18] destroy 19917.03 [ 0.00,38995.87] mdt 1 file 500000 dir 4 thr 8 create 18914.35 [ 0.00,26998.27] destroy 21628.42 [ 0.00,32997.89] mdt 1 file 500000 dir 4 thr 8 create 19823.51 [ 0.00,25998.34] destroy 19660.51 [ 0.00,41988.24] master w/o quota mdt 1 file 500000 dir 4 thr 8 create 34226.00 [ 0.00,98993.27] destroy 25597.07 [ 0.00,89981.28] mdt 1 file 500000 dir 4 thr 8 create 37657.63 [ 0.00,91993.84] destroy 32871.15 [ 0.00,64997.92] mdt 1 file 500000 dir 4 thr 8 create 36370.87 [ 0.00,64986.35] destroy 28918.50 [ 0.00,74993.85] mdt 1 file 500000 dir 4 thr 8 create 37004.56 [ 0.00,89994.24] destroy 32085.39 [ 0.00,67995.04] mdt 1 file 500000 dir 4 thr 8 create 34277.48 [ 0.00,82992.45] destroy 28381.06 [ 0.00,62990.17] master with quota scalability mdt 1 file 500000 dir 4 thr 8 create 33076.15 [ 0.00,92994.23] destroy 34047.86 [ 0.00,74991.90] mdt 1 file 500000 dir 4 thr 8 create 34758.91 [ 0.00,83994.96] destroy 29910.64 [ 0.00,96989.91] mdt 1 file 500000 dir 4 thr 8 create 33244.73 [ 0.00,91993.93] destroy 33343.84 [ 0.00,99990.20] mdt 1 file 500000 dir 4 thr 8 create 36468.85 [ 0.00,86994.35] destroy 32234.66 [ 0.00,91988.50] mdt 1 file 500000 dir 4 thr 8 create 35831.91 [ 0.00,91993.84] destroy 34748.24 [ 0.00,85989.25] |
| Comment by Keith Mannthey (Inactive) [ 04/Jan/13 ] |
|
From the "mds-survey results with 8 threads" Is there a good explanation for why "master with quota scalability" is generally faster than "master w/o quota" when it comes to destroy? Is there any information for other thread counts 1,2,16,32? |
| Comment by Lai Siyao [ 05/Jan/13 ] |
|
The test result varies a bit, and mostly I saw "master w/o quota" a bit faster than "master with quota scalability" in destroy, though the listed ones looks slower. I couldn't find a powerful test node, so that I didn't test bigger thread numbers because they shows a drop in create/destroy rate. And /usr/lib64/lustre/tests/mds-survey.sh set "thrlo" to the cpu number, as looks reasonable, so I skipped 1,2 thread numbers too. To sum up, it makes sense to collect accurate metadata perf data on a fat node, but I don't have the resources and time to do that right now. And current test result can roughly reflect the performance differences for different code. |
| Comment by Johann Lombardi (Inactive) [ 07/Jan/13 ] |
|
All patches combined represent a significant amount of changes. I tend to think that the most important patch is the one removing the dqptr_sem. |
| Comment by Lai Siyao [ 08/Jan/13 ] |
|
Hmm, http://www.spinics.net/lists/linux-fsdevel/msg39427.html addresses this issue, and it's included in previous test. I'll try to test with it only. |
| Comment by Lai Siyao [ 13/Jan/13 ] |
|
The patch mentioned above is ready, I'll conduct test later. |
| Comment by Andreas Dilger [ 14/Jan/13 ] |
| Comment by Cory Spitz [ 21/Jan/13 ] |
|
If the patch wasn't accepted upstream previously is there any hope of getting it landed now? |
| Comment by Andreas Dilger [ 08/Feb/13 ] |
|
Cory, we can always make another attempt to push it upstream... |
| Comment by Lai Siyao [ 19/Feb/13 ] |
|
Below is the test result for http://review.whamcloud.com/5010: RHEL6, MDS with 24 cores, 24G men. master: master w/o quota: master w/o quota: master w/ patched quota: [root@fat-intel-2 tests]# for i in `seq 1 5`; do sh llmount.sh >/dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh > /dev/null 2>&1; done 2.3: [root@fat-intel-2 tests]# for i in `seq 1 5`; do sh llmount.sh >/dev/null 2>&1; tests_str="create destroy" file_count=500000 thrlo=4 mds-survey; sh llmountcleanup.sh >/dev/null 2>&1; done |
| Comment by James A Simmons [ 22/Feb/13 ] |
|
Does this patch need to be applied to the client's kernels? If so the patch needs to be updated for other platforms as well. |
| Comment by Johann Lombardi (Inactive) [ 22/Feb/13 ] |
|
No, it is a server-side patch. |
| Comment by Lai Siyao [ 22/Mar/13 ] |
|
I've finished performance test against ext4, the test machine has 24 cores, 24G men. Mount option "usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0" is used when testing with quota. Below is test script: total=800000 thrno=1 THRMAX=32 while [ $thrno -le $THRMAX ]; do count=$(($total/$thrno)) mpirun -np $thrno -machinefile /tmp/machinefile mdtest -d /mnt/temp/d1 -i 1 -n $count -u -F >> /tmp/md.txt 2>&1 thrno=$((2*$thrno)) done Mdtest create test result (Column is the parallel thread count): quota patched quota w/o quota 1 36614.457 36661.038 41179.899 2 35046.13 34979.013 47455.064 4 35046.13 50748.669 56157.671 8 28781.597 39255.844 46426.061 16 11251.192 28218.439 50534.734 32 7880.249 31173.627 46366.125
Mdtest unlink test result: quota patched quota w/o quota 1 29762.146 29307.613 30245.654 2 21131.769 26865.454 27094.563 4 13891.783 16076.384 17079.314 8 14099.972 12393.909 11943.64 16 6662.111 4812.819 6265.979 32 12735.206 4297.173 5210.164
The create test result is as expected, but unlink not. We can see when there are more test threads, test with quota achieves best result, even than w/o quota. The result of oprofile of mdtest unlink is as below: samples % image name app name symbol name 73245 5.0417 vmlinux vmlinux intel_idle 68195 4.6941 vmlinux vmlinux __hrtimer_start_range_ns 44222 3.0439 vmlinux vmlinux schedule 26741 1.8407 vmlinux vmlinux mutex_spin_on_owner 26280 1.8089 vmlinux vmlinux __find_get_block 23680 1.6300 vmlinux vmlinux rwsem_down_failed_common 22958 1.5803 ext4.ko ext4.ko ext4_mark_iloc_dirty 20814 1.4327 vmlinux vmlinux update_curr 19519 1.3436 vmlinux vmlinux rb_erase 18140 1.2486 vmlinux vmlinux thread_return patched quota: samples % image name app name symbol name 3235409 50.1659 vmlinux vmlinux dqput 1140972 17.6911 vmlinux vmlinux dqget 347286 5.3848 vmlinux vmlinux mutex_spin_on_owner 278271 4.3147 vmlinux vmlinux dquot_mark_dquot_dirty 277685 4.3056 vmlinux vmlinux dquot_commit 51187 0.7937 vmlinux vmlinux intel_idle 38886 0.6029 vmlinux vmlinux __find_get_block 32483 0.5037 vmlinux vmlinux schedule 30017 0.4654 ext4.ko ext4.ko ext4_mark_iloc_dirty 29618 0.4592 jbd2.ko jbd2.ko jbd2_journal_add_journal_head 29483 0.4571 vmlinux vmlinux mutex_lock w/o quota: samples % image name app name symbol name 173301 6.3691 vmlinux vmlinux schedule 150041 5.5142 vmlinux vmlinux __audit_syscall_exit 148352 5.4522 vmlinux vmlinux update_curr 110868 4.0746 libmpi.so.1.0.2 libmpi.so.1.0.2 /usr/lib64/openmpi/lib/libmpi.so.1.0.2 105145 3.8642 libc-2.12.so libc-2.12.so sched_yield 104872 3.8542 vmlinux vmlinux sys_sched_yield 99494 3.6566 mca_btl_sm.so mca_btl_sm.so /usr/lib64/openmpi/lib/openmpi/mca_btl_sm.so 92536 3.4008 vmlinux vmlinux mutex_spin_on_owner 85868 3.1558 vmlinux vmlinux system_call I don't quite understand the cause yet, maybe removing dqptr_sem causes more process scheduling? |
| Comment by Lai Siyao [ 27/Mar/13 ] |
|
For patched quota, `oprofile -d ...` shows dqput and dqget are contending on dq_list_lock, however to improve this, more code changes are needed. |
| Comment by Peter Jones [ 01/Apr/13 ] |
|
Landed for 2.4 |
| Comment by Andreas Dilger [ 17/Apr/13 ] |
|
This patch was only included into the RHEL6.3 patch series, not RHEL6.4 or SLES11 SP2. |
| Comment by James A Simmons [ 23/Apr/13 ] |
|
The question I have is do I included this fix with my |
| Comment by Lai Siyao [ 23/Apr/13 ] |
|
Andreas, this patch is against VFS code, and can support RHEL6.3/6.4 with the same set of code. James, it will be great if you can help port to SLES11SP2 and FC18 kernels, and IMO you don't need to include this in |
| Comment by Andreas Dilger [ 23/Apr/13 ] |
|
My mistake. I thought there were different kernel series for RHEL 6.3 and 6.4, but this is only true for ldiskfs. |
| Comment by James A Simmons [ 24/Apr/13 ] |
|
Started to look at this patch for both FC18 and SLES11 SP2. For FC18 server support we need several patches to make it build. Because of this I doubt it will make it in the 2.4 release. Now the SLES11 SP2 support works with master. What I was thinking is to break up the |
| Comment by James A Simmons [ 24/Apr/13 ] |
|
Oh while we are fixing the kernel quota issues I like to add this into the patch as well. |
| Comment by James A Simmons [ 25/Apr/13 ] |
|
I have a patch for SLES11 SP2 which I'm testing right now. Once it passes I will push it to gerrit. |
| Comment by James A Simmons [ 25/Apr/13 ] |
|
My patch is at http://review.whamcloud.com/6168. |
| Comment by James A Simmons [ 03/May/13 ] |
|
The SLES11 SP[1,2] platforms have been fix. We should keep this ticket open until fc18 is addresses for 2.5. Also we really should push this patch upstream and have the distributions pick this up. I recommend linking this ticket to |
| Comment by Peter Jones [ 03/May/13 ] |
|
Good idea! |
| Comment by Andreas Dilger [ 17/Sep/13 ] |
|
The performance regression was resolved for 2.5.0 by applying the quota-replace-dqptr-sem and quota-avoid-dqget-calls patches to the kernel. I'm closing this bug, and opened |