Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • Hyperion/LLNL
    • 3
    • 8187

    Description

      We performed a comparison between 2.3.0, 2.1.5 and current Lustre. We say a regression in metadata performance compared to 2.3.0. Spreadsheet attached.

      Attachments

        Issue Links

          Activity

            [LU-3305] Quotas affect Metadata performance

            There are two core kernel patches in the RHEL series on master that improve quota performance - both start with "quota".

            adilger Andreas Dilger added a comment - There are two core kernel patches in the RHEL series on master that improve quota performance - both start with "quota".

            It's hard to gather from the prior discussion, but is this the only patch that came out of this issue:

            commit 58d2a322589ec13ee3c585c13b1c83f429d946ce
            Author: Niu Yawei <yawei.niu@intel.com>
            Date:   Thu May 23 23:49:03 2013 -0400
            
                LU-3305 quota: avoid unnecessary dqget/dqput calls
            

            ? Thanks.

            prakash Prakash Surya (Inactive) added a comment - It's hard to gather from the prior discussion, but is this the only patch that came out of this issue: commit 58d2a322589ec13ee3c585c13b1c83f429d946ce Author: Niu Yawei <yawei.niu@intel.com> Date: Thu May 23 23:49:03 2013 -0400 LU-3305 quota: avoid unnecessary dqget/dqput calls ? Thanks.
            pjones Peter Jones added a comment -

            Landed for 2.4.1 and 2.5

            pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5

            I did few tests on Rosso cluster, the result is similar to what we got in LU-2442, except that the performance drop problem (with 32 threads) showed in LU-2442 (with LU-2442 patch) is resolved:

            patched:

            mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes
            Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r
            Path: /mnt
            FS: 19.7 GiB   Used FS: 17.5%   Inodes: 1.2 Mi   Used Inodes: 4.6%
            
            32 tasks, 800000 files
            
            SUMMARY: (of 10 iterations)
               Operation                  Max        Min       Mean    Std Dev
               ---------                  ---        ---       ----    -------
               File creation     :      0.000      0.000      0.000      0.000
               File stat         :      0.000      0.000      0.000      0.000
               File removal      :   4042.032   1713.613   2713.827    698.243
               Tree creation     :      0.000      0.000      0.000      0.000
               Tree removal      :      2.164      1.861      2.020      0.088
            
            -- finished at 07/07/2013 20:37:36 --
            CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated)
            Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
            samples  %        app name                 symbol name
            10826148 11.6911  vmlinux                  schedule
            7089656   7.6561  vmlinux                  update_curr
            6432166   6.9461  vmlinux                  sys_sched_yield
            4384494   4.7348  vmlinux                  __audit_syscall_exit
            4088507   4.4152  libc-2.12.so             sched_yield
            3441346   3.7163  vmlinux                  system_call_after_swapgs
            3337224   3.6038  vmlinux                  put_prev_task_fair
            3244213   3.5034  vmlinux                  audit_syscall_entry
            2844216   3.0715  vmlinux                  thread_return
            2702323   2.9182  vmlinux                  rb_insert_color
            2636798   2.8475  vmlinux                  native_read_tsc
            2234644   2.4132  vmlinux                  sched_clock_cpu
            2182744   2.3571  vmlinux                  native_sched_clock
            2175482   2.3493  vmlinux                  hrtick_start_fair
            2152807   2.3248  vmlinux                  pick_next_task_fair
            2130024   2.3002  vmlinux                  set_next_entity
            2099576   2.2673  vmlinux                  rb_erase
            1790101   1.9331  vmlinux                  update_stats_wait_end
            1777328   1.9193  vmlinux                  mutex_spin_on_owner
            1701672   1.8376  vmlinux                  sysret_check
            

            unpatched:

            mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes
            Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r
            Path: /mnt
            FS: 19.7 GiB   Used FS: 17.8%   Inodes: 1.2 Mi   Used Inodes: 4.6%
            
            32 tasks, 800000 files
            
            SUMMARY: (of 10 iterations)
               Operation                  Max        Min       Mean    Std Dev
               ---------                  ---        ---       ----    -------
               File creation     :      0.000      0.000      0.000      0.000
               File stat         :      0.000      0.000      0.000      0.000
               File removal      :   2816.345   1673.085   2122.347    342.119
               Tree creation     :      0.000      0.000      0.000      0.000
               Tree removal      :      2.296      0.111      1.361      0.866
            
            -- finished at 07/07/2013 21:11:03 --
            CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated)
            Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
            samples  %        image name               app name                 symbol name
            23218790 18.3914  vmlinux                  vmlinux                  dqput
            9549739   7.5643  vmlinux                  vmlinux                  __audit_syscall_exit
            9116086   7.2208  vmlinux                  vmlinux                  schedule
            8290558   6.5669  vmlinux                  vmlinux                  dqget
            5576620   4.4172  vmlinux                  vmlinux                  update_curr
            5343755   4.2327  vmlinux                  vmlinux                  sys_sched_yield
            3251018   2.5751  libc-2.12.so             libc-2.12.so             sched_yield
            2907579   2.3031  vmlinux                  vmlinux                  system_call_after_swapgs
            2854863   2.2613  vmlinux                  vmlinux                  put_prev_task_fair
            2793392   2.2126  vmlinux                  vmlinux                  audit_syscall_entry
            2723949   2.1576  vmlinux                  vmlinux                  kfree
            2551007   2.0206  vmlinux                  vmlinux                  mutex_spin_on_owner
            2406364   1.9061  vmlinux                  vmlinux                  rb_insert_color
            2321179   1.8386  vmlinux                  vmlinux                  thread_return
            2184031   1.7299  vmlinux                  vmlinux                  native_read_tsc
            2002277   1.5860  vmlinux                  vmlinux                  dquot_mark_dquot_dirty
            1990135   1.5764  vmlinux                  vmlinux                  native_sched_clock
            1970544   1.5608  vmlinux                  vmlinux                  set_next_entity
            1967852   1.5587  vmlinux                  vmlinux                  pick_next_task_fair
            1966282   1.5575  vmlinux                  vmlinux                  dquot_commit
            1966271   1.5575  vmlinux                  vmlinux                  sysret_check
            1919524   1.5204  vmlinux                  vmlinux                  unroll_tree_refs
            1811281   1.4347  vmlinux                  vmlinux                  sched_clock_cpu
            1810278   1.4339  vmlinux                  vmlinux                  rb_erase
            

            The unlink rate speed increased ~28% with 32 threads, and the oprofile data shows contention on dq_list_lock in dqput() is alleviated a lot.

            I think we should take this patch as a supplement of fix LU-2442.

            niu Niu Yawei (Inactive) added a comment - I did few tests on Rosso cluster, the result is similar to what we got in LU-2442 , except that the performance drop problem (with 32 threads) showed in LU-2442 (with LU-2442 patch) is resolved: patched: mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r Path: /mnt FS: 19.7 GiB Used FS: 17.5% Inodes: 1.2 Mi Used Inodes: 4.6% 32 tasks, 800000 files SUMMARY: (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 0.000 0.000 0.000 0.000 File removal : 4042.032 1713.613 2713.827 698.243 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 2.164 1.861 2.020 0.088 -- finished at 07/07/2013 20:37:36 -- CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 10826148 11.6911 vmlinux schedule 7089656 7.6561 vmlinux update_curr 6432166 6.9461 vmlinux sys_sched_yield 4384494 4.7348 vmlinux __audit_syscall_exit 4088507 4.4152 libc-2.12.so sched_yield 3441346 3.7163 vmlinux system_call_after_swapgs 3337224 3.6038 vmlinux put_prev_task_fair 3244213 3.5034 vmlinux audit_syscall_entry 2844216 3.0715 vmlinux thread_return 2702323 2.9182 vmlinux rb_insert_color 2636798 2.8475 vmlinux native_read_tsc 2234644 2.4132 vmlinux sched_clock_cpu 2182744 2.3571 vmlinux native_sched_clock 2175482 2.3493 vmlinux hrtick_start_fair 2152807 2.3248 vmlinux pick_next_task_fair 2130024 2.3002 vmlinux set_next_entity 2099576 2.2673 vmlinux rb_erase 1790101 1.9331 vmlinux update_stats_wait_end 1777328 1.9193 vmlinux mutex_spin_on_owner 1701672 1.8376 vmlinux sysret_check unpatched: mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r Path: /mnt FS: 19.7 GiB Used FS: 17.8% Inodes: 1.2 Mi Used Inodes: 4.6% 32 tasks, 800000 files SUMMARY: (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 0.000 0.000 0.000 0.000 File removal : 2816.345 1673.085 2122.347 342.119 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 2.296 0.111 1.361 0.866 -- finished at 07/07/2013 21:11:03 -- CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name app name symbol name 23218790 18.3914 vmlinux vmlinux dqput 9549739 7.5643 vmlinux vmlinux __audit_syscall_exit 9116086 7.2208 vmlinux vmlinux schedule 8290558 6.5669 vmlinux vmlinux dqget 5576620 4.4172 vmlinux vmlinux update_curr 5343755 4.2327 vmlinux vmlinux sys_sched_yield 3251018 2.5751 libc-2.12.so libc-2.12.so sched_yield 2907579 2.3031 vmlinux vmlinux system_call_after_swapgs 2854863 2.2613 vmlinux vmlinux put_prev_task_fair 2793392 2.2126 vmlinux vmlinux audit_syscall_entry 2723949 2.1576 vmlinux vmlinux kfree 2551007 2.0206 vmlinux vmlinux mutex_spin_on_owner 2406364 1.9061 vmlinux vmlinux rb_insert_color 2321179 1.8386 vmlinux vmlinux thread_return 2184031 1.7299 vmlinux vmlinux native_read_tsc 2002277 1.5860 vmlinux vmlinux dquot_mark_dquot_dirty 1990135 1.5764 vmlinux vmlinux native_sched_clock 1970544 1.5608 vmlinux vmlinux set_next_entity 1967852 1.5587 vmlinux vmlinux pick_next_task_fair 1966282 1.5575 vmlinux vmlinux dquot_commit 1966271 1.5575 vmlinux vmlinux sysret_check 1919524 1.5204 vmlinux vmlinux unroll_tree_refs 1811281 1.4347 vmlinux vmlinux sched_clock_cpu 1810278 1.4339 vmlinux vmlinux rb_erase The unlink rate speed increased ~28% with 32 threads, and the oprofile data shows contention on dq_list_lock in dqput() is alleviated a lot. I think we should take this patch as a supplement of fix LU-2442 .

            Thanks a lot, Minh.

            Looks the patch improves create/rm performance overall, but there are something strange in the figure that I don't know why:

            • The stat performance is getting worse with patch (or disable quota), I don't how the quota code can affect the read-only operations. Maybe we need to collect some oprofile data to investigate this further. (for both 2.4 & patched 2.4);
            • create/rm performance drops a lot on 3 threads;
            • For the per proc create/rm with 4 threads, patched 2.4 (and disable quota) is even worse than standard 2.4, looks it's same with what Siyao discovered in LU-2442 (unlink getting worse with 32 threads when disabled quota), contention on a global semaphore is better than contention on several spin locks when the contention is heavy enough?

            BTW: why we put the create & rm data into same figure? I think they are two distinct tests, aren't they?

            niu Niu Yawei (Inactive) added a comment - Thanks a lot, Minh. Looks the patch improves create/rm performance overall, but there are something strange in the figure that I don't know why: The stat performance is getting worse with patch (or disable quota), I don't how the quota code can affect the read-only operations. Maybe we need to collect some oprofile data to investigate this further. (for both 2.4 & patched 2.4); create/rm performance drops a lot on 3 threads; For the per proc create/rm with 4 threads, patched 2.4 (and disable quota) is even worse than standard 2.4, looks it's same with what Siyao discovered in LU-2442 (unlink getting worse with 32 threads when disabled quota), contention on a global semaphore is better than contention on several spin locks when the contention is heavy enough? BTW: why we put the create & rm data into same figure? I think they are two distinct tests, aren't they?
            mdiep Minh Diep added a comment -

            performance data for the patch

            mdiep Minh Diep added a comment - performance data for the patch

            Fine with me.

            simmonsja James A Simmons added a comment - Fine with me.

            People

              niu Niu Yawei (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: