Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12151

metadata performance difference on root and non-root user

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.10.5
    • None
    • lustre-2.10.5-RC2/ldiskfs
    • 3
    • 9223372036854775807

    Description

      We found a huge performance difference on file creation with root user and non-root user .

      • 1 x MDS(1 x Platinum 8160, 96GB memory, EDR)
      • 32 x client(2 x E5-2650 v4, 128GB memory, EDR)
      • 1 x ES14K (40 x SSD)

      root user

      [root@c01 ~]# salloc -N 32 --ntasks-per-node=20 mpirun --allow-run-as-root /work/tools/bin/mdtest -n 1000 -F -v -u -d /scratch0/bmuser/ -C
      SUMMARY: (of 1 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :     151328.449     151328.449     151328.449          0.000
         File stat         :          0.000          0.000          0.000          0.000
         File read         :          0.000          0.000          0.000          0.000
         File removal      :          0.000          0.000          0.000          0.000
         Tree creation     :         42.057         42.057         42.057          0.000
         Tree removal      :          0.000          0.000          0.000          0.000
      V-1: Entering timestamp...
      

      Non-root user

      [bmuser@c01 ~]$ salloc -N 32 --ntasks-per-node=20 mpirun --allow-run-as-root /work/tools/bin/mdtest -n 1000 -F -v -u -d /scratch0/bmuser/ -C
      SUMMARY: (of 1 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         File creation     :     102825.662     102825.662     102825.662          0.000
         File stat         :          0.000          0.000          0.000          0.000
         File read         :          0.000          0.000          0.000          0.000
         File removal      :          0.000          0.000          0.000          0.000
         Tree creation     :         30.589         30.589         30.589          0.000
         Tree removal      :          0.000          0.000          0.000          0.000
      V-1: Entering timestamp...
      

      150K (root) vs 100K (non-root)

      Attachments

        Issue Links

          Activity

            [LU-12151] metadata performance difference on root and non-root user

            Hi,

            Yup, that would be nice.

            Thank you,
            Shilong

            wshilong Wang Shilong (Inactive) added a comment - Hi, Yup, that would be nice. Thank you, Shilong

            Hi Wang Shilong,

            unfortunately the patch dropped from the porting list and was forgotten for a while.

            I'll measure how the xtime optimization improves performance in addition to the owner optimization and open a new ticket.

            Are you ok with that?

            Thank you

            panda Andrew Perepechko added a comment - Hi Wang Shilong, unfortunately the patch dropped from the porting list and was forgotten for a while. I'll measure how the xtime optimization improves performance in addition to the owner optimization and open a new ticket. Are you ok with that? Thank you

            Hi Andrew Perepechko,

            Yup, you guys have similar optimizations three years ago It is a pity that our Lustre upstream did not have similar thing for a long time.

            Passing down xtime down could avoid us an extra ext4 inode dirty operation(which reduce jbd2 memory operations) even not huge improvements like this uid/gid but deserve us to do.

            Do you agree a separate ticket for that?

            Thank you,
            Shilong

            wshilong Wang Shilong (Inactive) added a comment - Hi Andrew Perepechko, Yup, you guys have similar optimizations three years ago It is a pity that our Lustre upstream did not have similar thing for a long time. Passing down xtime down could avoid us an extra ext4 inode dirty operation(which reduce jbd2 memory operations) even not huge improvements like this uid/gid but deserve us to do. Do you agree a separate ticket for that? Thank you, Shilong

            Passing xtimes (even with as low as 1 s resolution) can sometimes be beneficial as well: https://github.com/Xyratex/lustre-stable/commit/7ab00b00eb057f6963c0b5641686240ef95e1388#diff-89ce3dab611fea06ce62efa5bed4ae63

            panda Andrew Perepechko added a comment - Passing xtimes (even with as low as 1 s resolution) can sometimes be beneficial as well: https://github.com/Xyratex/lustre-stable/commit/7ab00b00eb057f6963c0b5641686240ef95e1388#diff-89ce3dab611fea06ce62efa5bed4ae63

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34685/
            Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: f3d83215acd79ad062d3c605ca7dc8ba373be65d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34685/ Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: f3d83215acd79ad062d3c605ca7dc8ba373be65d

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34685
            Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: a6996b311a4c852a8eeb68b684c046d00fbac127

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34685 Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a6996b311a4c852a8eeb68b684c046d00fbac127
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34581/
            Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 697f2d95bfdca13565ccc5d50e106114604c1724

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34581/ Subject: LU-12151 osd-ldiskfs: pass owner down rather than transfer it Project: fs/lustre-release Branch: master Current Patch Set: Commit: 697f2d95bfdca13565ccc5d50e106114604c1724

            Thanks Ihara for testing the patch, will include the results into patch commit.

            wshilong Wang Shilong (Inactive) added a comment - Thanks Ihara for testing the patch, will include the results into patch commit.

            Here is current file creation speed on master branch with root and non-root user.
            170K ops/sec (root user) vs 100K ops/sec (non-root user) for file creation.

            [root@c01 ~]# id
            uid=0(root) gid=0(root) groups=0(root)
            [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0
            salloc: Granted job allocation 6045
            -- started at 04/03/2019 18:55:10 --
            
            mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s)
            Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0"
            Path: /cache1
            FS: 3.9 TiB   Used FS: 0.0%   Inodes: 160.0 Mi   Used Inodes: 0.0%
            
            768 tasks, 1536000 files
            
            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :     175749.654     175738.010     175741.938          1.705
               File stat         :     495658.996     495619.768     495634.739          6.634
               File read         :     257464.620     257412.150     257446.462         12.140
               File removal      :     197592.306     197444.295     197539.519         51.355
               Tree creation     :         51.695         51.695         51.695          0.000
               Tree removal      :         14.876         14.876         14.876          0.000
            
            [sihara@c01 ~]$ id
            uid=10000(sihara) gid=100(users) groups=100(users)
            [sihara@c01 ~]$ salloc  -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0
            salloc: Granted job allocation 6043
            -- started at 04/03/2019 18:44:27 --
            
            mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s)
            Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0"
            Path: /cache1
            FS: 3.9 TiB   Used FS: 0.0%   Inodes: 160.0 Mi   Used Inodes: 0.0%
            
            768 tasks, 1536000 files
            
            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :     108634.397     108630.106     108631.673          0.614
               File stat         :     468761.147     468723.486     468736.693          6.927
               File read         :     261685.099     261646.894     261671.608          8.099
               File removal      :     180895.760     180851.349     180876.868          9.373
               Tree creation     :         61.624         61.624         61.624          0.000
               Tree remova
            

             

            After apply patch https://review.whamcloud.com/34581 , non-root user is able to get same file creation rate as root user. 

            [sihara@c01 ~]$ id
            uid=10000(sihara) gid=100(users) groups=100(users)
            [sihara@c01 ~]$ salloc  -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0
            salloc: Granted job allocation 6048
            -- started at 04/03/2019 19:11:49 --
            
            mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s)
            Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0"
            Path: /cache1
            FS: 3.9 TiB   Used FS: 0.0%   Inodes: 160.0 Mi   Used Inodes: 0.0%
            
            768 tasks, 1536000 files
            
            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :     185227.246     185213.609     185218.466          2.187
               File stat         :     472370.658     472306.853     472329.189         13.733
               File read         :     262557.843     262528.916     262540.418          8.698
               File removal      :     177183.588     176814.351     177165.183         25.934
               Tree creation     :         43.364         43.364         43.364          0.000
               Tree removal      :         13.871         13.871         13.871          0.000
             

            And, no regressions found with root user too. (Just in case)

            [root@c01 ~]# id
            uid=0(root) gid=0(root) groups=0(root)
            [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0
            salloc: Granted job allocation 6050
            -- started at 04/03/2019 19:14:49 --
            
            mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s)
            Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0"
            Path: /cache1
            FS: 3.9 TiB   Used FS: 0.0%   Inodes: 160.0 Mi   Used Inodes: 0.0%
            
            768 tasks, 1536000 files
            
            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :     184781.517     184747.142     184775.286          4.355
               File stat         :     471423.053     471288.526     471350.414         16.800
               File read         :     259265.668     259197.540     259250.629         12.143
               File removal      :     180106.410     180034.379     180086.385         10.014
               Tree creation     :         45.413         45.413         45.413          0.000
               Tree removal      :         13.507         13.507         13.507          0.000
            
            sihara Shuichi Ihara added a comment - Here is current file creation speed on master branch with root and non-root user. 170K ops/sec (root user) vs 100K ops/sec (non-root user) for file creation. [root@c01 ~]# id uid=0(root) gid=0(root) groups=0(root) [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6045 -- started at 04/03/2019 18:55:10 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 175749.654 175738.010 175741.938 1.705 File stat : 495658.996 495619.768 495634.739 6.634 File read : 257464.620 257412.150 257446.462 12.140 File removal : 197592.306 197444.295 197539.519 51.355 Tree creation : 51.695 51.695 51.695 0.000 Tree removal : 14.876 14.876 14.876 0.000 [sihara@c01 ~]$ id uid=10000(sihara) gid=100(users) groups=100(users) [sihara@c01 ~]$ salloc -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6043 -- started at 04/03/2019 18:44:27 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 108634.397 108630.106 108631.673 0.614 File stat : 468761.147 468723.486 468736.693 6.927 File read : 261685.099 261646.894 261671.608 8.099 File removal : 180895.760 180851.349 180876.868 9.373 Tree creation : 61.624 61.624 61.624 0.000 Tree remova   After apply patch https://review.whamcloud.com/34581 , non-root user is able to get same file creation rate as root user.  [sihara@c01 ~]$ id uid=10000(sihara) gid=100(users) groups=100(users) [sihara@c01 ~]$ salloc -N 32 --ntasks-per-node=24 mpirun -np 768 /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6048 -- started at 04/03/2019 19:11:49 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 185227.246 185213.609 185218.466 2.187 File stat : 472370.658 472306.853 472329.189 13.733 File read : 262557.843 262528.916 262540.418 8.698 File removal : 177183.588 176814.351 177165.183 25.934 Tree creation : 43.364 43.364 43.364 0.000 Tree removal : 13.871 13.871 13.871 0.000 And, no regressions found with root user too. (Just in case) [root@c01 ~]# id uid=0(root) gid=0(root) groups=0(root) [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 2000 -F -u -d /cache1/mdt0 salloc: Granted job allocation 6050 -- started at 04/03/2019 19:14:49 -- mdtest-1.9.3 was launched with 768 total task(s) on 32 node(s) Command line used: /work/tools/bin/mdtest "-n" "2000" "-F" "-u" "-d" "/cache1/mdt0" Path: /cache1 FS: 3.9 TiB Used FS: 0.0% Inodes: 160.0 Mi Used Inodes: 0.0% 768 tasks, 1536000 files SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 184781.517 184747.142 184775.286 4.355 File stat : 471423.053 471288.526 471350.414 16.800 File read : 259265.668 259197.540 259250.629 12.143 File removal : 180106.410 180034.379 180086.385 10.014 Tree creation : 45.413 45.413 45.413 0.000 Tree removal : 13.507 13.507 13.507 0.000

            People

              wshilong Wang Shilong (Inactive)
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: