[LU-12151] metadata performance difference on root and non-root user Created: 03/Apr/19 Updated: 11/Feb/20 Resolved: 13/Apr/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.5 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara | Assignee: | Wang Shilong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre-2.10.5-RC2/ldiskfs |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
We found a huge performance difference on file creation with root user and non-root user .
root user [root@c01 ~]# salloc -N 32 --ntasks-per-node=20 mpirun --allow-run-as-root /work/tools/bin/mdtest -n 1000 -F -v -u -d /scratch0/bmuser/ -C SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 151328.449 151328.449 151328.449 0.000 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 42.057 42.057 42.057 0.000 Tree removal : 0.000 0.000 0.000 0.000 V-1: Entering timestamp... Non-root user [bmuser@c01 ~]$ salloc -N 32 --ntasks-per-node=20 mpirun --allow-run-as-root /work/tools/bin/mdtest -n 1000 -F -v -u -d /scratch0/bmuser/ -C SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 102825.662 102825.662 102825.662 0.000 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 30.589 30.589 30.589 0.000 Tree removal : 0.000 0.000 0.000 0.000 V-1: Entering timestamp... 150K (root) vs 100K (non-root) |
| Comments |
| Comment by Shuichi Ihara [ 03/Apr/19 ] |
|
it seems related to. Just quick a hack to disable quota acounting for non-root user, performance is back. diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c index 060cbb8..8f68d91 100644 --- a/lustre/osd-ldiskfs/osd_handler.c +++ b/lustre/osd-ldiskfs/osd_handler.c @@ -2631,6 +2631,8 @@ static int osd_quota_transfer(struct inode *inode, const struct lu_attr *attr) { int rc; + return 0; + if ((attr->la_valid & LA_UID && attr->la_uid != i_uid_read(inode)) || (attr->la_valid & LA_GID && attr->la_gid != i_gid_read(inode))) { struct iattr iattr; [bmuser@c01 ~]$ salloc -N 32 --ntasks-per-node=20 mpirun --allow-run-as-root /work/tools/bin/mdtest -n 1000 -F -v -u -d /scratch0/bmuser/ -C SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 151046.528 151046.528 151046.528 0.000 File stat : 0.000 0.000 0.000 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 17.299 17.299 17.299 0.000 Tree removal : 0.000 0.000 0.000 0.000 V-1: Entering timestamp... |
| Comment by Alex Zhuravlev [ 03/Apr/19 ] |
|
was quota enforcement enabled? |
| Comment by Wang Shilong (Inactive) [ 03/Apr/19 ] |
|
Alex, I guess Ihara's test quota enforcement is not enabled. But the problem is for non-root user, uid/gid is 0 for precreation and we need transfer space accounting for it, which we hit some lock bottleneck here, I had a patch locally but did not push and confirm testing yet. |
| Comment by Shuichi Ihara [ 03/Apr/19 ] |
|
correct. quota slave ("-O quota") was enabled (deafult), but no quota enfocement enabled. |
| Comment by Alex Zhuravlev [ 03/Apr/19 ] |
|
sorry, don't understand - we do not change uid/gid for precreated objects in create path? |
| Comment by Wang Shilong (Inactive) [ 03/Apr/19 ] |
|
Alex, oops, OST object uid/gid only changed at first write, I think the problem existed for MDS |->osd_create
|->osd_create_type_f
|->osd_mkreg
|->ldiskfs_create_inode
|->ext4_new_inode() which we pass owner as NULL which we will create 0 as uid/gid
|->osd_attr_init
|->osd_quota_transfer ---->which will change uid/gid again for above.
I think efficient way might be we pass owner down to ldiskfs_create_inode(), which is more efficient |
| Comment by Gerrit Updater [ 03/Apr/19 ] |
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/34581 |
| Comment by Shuichi Ihara [ 03/Apr/19 ] |
|
Here is current file creation speed on master branch with root and non-root user. [root@c01 ~]# id uid=0(root) gid=0(root) groups=0(root) [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6045 -- started at 04/03/2019 18:55:10 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 175749.654 175738.010 175741.938 1.705 File stat : 495658.996 495619.768 495634.739 6.634 File read : 257464.620 257412.150 257446.462 12.140 File removal : 197592.306 197444.295 197539.519 51.355 Tree creation : 51.695 51.695 51.695 0.000 Tree removal : 14.876 14.876 14.876 0.000 [sihara@c01 ~]$ id uid=10000(sihara) gid=100(users) groups=100(users) [sihara@c01 ~]$ salloc -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6043 -- started at 04/03/2019 18:44:27 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 108634.397 108630.106 108631.673 0.614 File stat : 468761.147 468723.486 468736.693 6.927 File read : 261685.099 261646.894 261671.608 8.099 File removal : 180895.760 180851.349 180876.868 9.373 Tree creation : 61.624 61.624 61.624 0.000 Tree remova
After apply patch https://review.whamcloud.com/34581 , non-root user is able to get same file creation rate as root user. [sihara@c01 ~]$ id uid=10000(sihara) gid=100(users) groups=100(users) [sihara@c01 ~]$ salloc -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6048 -- started at 04/03/2019 19:11:49 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 185227.246 185213.609 185218.466 2.187 File stat : 472370.658 472306.853 472329.189 13.733 File read : 262557.843 262528.916 262540.418 8.698 File removal : 177183.588 176814.351 177165.183 25.934 Tree creation : 43.364 43.364 43.364 0.000 Tree removal : 13.871 13.871 13.871 0.000 And, no regressions found with root user too. (Just in case) [root@c01 ~]# id uid=0(root) gid=0(root) groups=0(root) [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6050 -- started at 04/03/2019 19:14:49 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 184781.517 184747.142 184775.286 4.355 File stat : 471423.053 471288.526 471350.414 16.800 File read : 259265.668 259197.540 259250.629 12.143 File removal : 180106.410 180034.379 180086.385 10.014 Tree creation : 45.413 45.413 45.413 0.000 Tree removal : 13.507 13.507 13.507 0.000 |
| Comment by Wang Shilong (Inactive) [ 03/Apr/19 ] |
|
Thanks Ihara for testing the patch, will include the results into patch commit. |
| Comment by Gerrit Updater [ 13/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34581/ |
| Comment by Peter Jones [ 13/Apr/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 16/Apr/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34685 |
| Comment by Gerrit Updater [ 21/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34685/ |
| Comment by Andrew Perepechko [ 29/May/19 ] |
|
Passing xtimes (even with as low as 1 s resolution) can sometimes be beneficial as well: https://github.com/Xyratex/lustre-stable/commit/7ab00b00eb057f6963c0b5641686240ef95e1388#diff-89ce3dab611fea06ce62efa5bed4ae63 |
| Comment by Wang Shilong (Inactive) [ 29/May/19 ] |
|
Hi Andrew Perepechko, Yup, you guys have similar optimizations three years ago Passing down xtime down could avoid us an extra ext4 inode dirty operation(which reduce jbd2 memory operations) even not huge improvements like this uid/gid but deserve us to do. Do you agree a separate ticket for that? Thank you, |
| Comment by Andrew Perepechko [ 29/May/19 ] |
|
Hi Wang Shilong, unfortunately the patch dropped from the porting list and was forgotten for a while. I'll measure how the xtime optimization improves performance in addition to the owner optimization and open a new ticket. Are you ok with that? Thank you |
| Comment by Wang Shilong (Inactive) [ 29/May/19 ] |
|
Hi, Yup, that would be nice. Thank you, |