Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • Hyperion/LLNL
    • 3
    • 8187

    Description

      We performed a comparison between 2.3.0, 2.1.5 and current Lustre. We say a regression in metadata performance compared to 2.3.0. Spreadsheet attached.

      Attachments

        Issue Links

          Activity

            [LU-3305] Quotas affect Metadata performance

            There are two core kernel patches in the RHEL series on master that improve quota performance - both start with "quota".

            adilger Andreas Dilger added a comment - There are two core kernel patches in the RHEL series on master that improve quota performance - both start with "quota".

            It's hard to gather from the prior discussion, but is this the only patch that came out of this issue:

            commit 58d2a322589ec13ee3c585c13b1c83f429d946ce
            Author: Niu Yawei <yawei.niu@intel.com>
            Date:   Thu May 23 23:49:03 2013 -0400
            
                LU-3305 quota: avoid unnecessary dqget/dqput calls
            

            ? Thanks.

            prakash Prakash Surya (Inactive) added a comment - It's hard to gather from the prior discussion, but is this the only patch that came out of this issue: commit 58d2a322589ec13ee3c585c13b1c83f429d946ce Author: Niu Yawei <yawei.niu@intel.com> Date: Thu May 23 23:49:03 2013 -0400 LU-3305 quota: avoid unnecessary dqget/dqput calls ? Thanks.
            pjones Peter Jones added a comment -

            Landed for 2.4.1 and 2.5

            pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5

            I did few tests on Rosso cluster, the result is similar to what we got in LU-2442, except that the performance drop problem (with 32 threads) showed in LU-2442 (with LU-2442 patch) is resolved:

            patched:

            mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes
            Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r
            Path: /mnt
            FS: 19.7 GiB   Used FS: 17.5%   Inodes: 1.2 Mi   Used Inodes: 4.6%
            
            32 tasks, 800000 files
            
            SUMMARY: (of 10 iterations)
               Operation                  Max        Min       Mean    Std Dev
               ---------                  ---        ---       ----    -------
               File creation     :      0.000      0.000      0.000      0.000
               File stat         :      0.000      0.000      0.000      0.000
               File removal      :   4042.032   1713.613   2713.827    698.243
               Tree creation     :      0.000      0.000      0.000      0.000
               Tree removal      :      2.164      1.861      2.020      0.088
            
            -- finished at 07/07/2013 20:37:36 --
            CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated)
            Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
            samples  %        app name                 symbol name
            10826148 11.6911  vmlinux                  schedule
            7089656   7.6561  vmlinux                  update_curr
            6432166   6.9461  vmlinux                  sys_sched_yield
            4384494   4.7348  vmlinux                  __audit_syscall_exit
            4088507   4.4152  libc-2.12.so             sched_yield
            3441346   3.7163  vmlinux                  system_call_after_swapgs
            3337224   3.6038  vmlinux                  put_prev_task_fair
            3244213   3.5034  vmlinux                  audit_syscall_entry
            2844216   3.0715  vmlinux                  thread_return
            2702323   2.9182  vmlinux                  rb_insert_color
            2636798   2.8475  vmlinux                  native_read_tsc
            2234644   2.4132  vmlinux                  sched_clock_cpu
            2182744   2.3571  vmlinux                  native_sched_clock
            2175482   2.3493  vmlinux                  hrtick_start_fair
            2152807   2.3248  vmlinux                  pick_next_task_fair
            2130024   2.3002  vmlinux                  set_next_entity
            2099576   2.2673  vmlinux                  rb_erase
            1790101   1.9331  vmlinux                  update_stats_wait_end
            1777328   1.9193  vmlinux                  mutex_spin_on_owner
            1701672   1.8376  vmlinux                  sysret_check
            

            unpatched:

            mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes
            Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r
            Path: /mnt
            FS: 19.7 GiB   Used FS: 17.8%   Inodes: 1.2 Mi   Used Inodes: 4.6%
            
            32 tasks, 800000 files
            
            SUMMARY: (of 10 iterations)
               Operation                  Max        Min       Mean    Std Dev
               ---------                  ---        ---       ----    -------
               File creation     :      0.000      0.000      0.000      0.000
               File stat         :      0.000      0.000      0.000      0.000
               File removal      :   2816.345   1673.085   2122.347    342.119
               Tree creation     :      0.000      0.000      0.000      0.000
               Tree removal      :      2.296      0.111      1.361      0.866
            
            -- finished at 07/07/2013 21:11:03 --
            CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated)
            Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
            samples  %        image name               app name                 symbol name
            23218790 18.3914  vmlinux                  vmlinux                  dqput
            9549739   7.5643  vmlinux                  vmlinux                  __audit_syscall_exit
            9116086   7.2208  vmlinux                  vmlinux                  schedule
            8290558   6.5669  vmlinux                  vmlinux                  dqget
            5576620   4.4172  vmlinux                  vmlinux                  update_curr
            5343755   4.2327  vmlinux                  vmlinux                  sys_sched_yield
            3251018   2.5751  libc-2.12.so             libc-2.12.so             sched_yield
            2907579   2.3031  vmlinux                  vmlinux                  system_call_after_swapgs
            2854863   2.2613  vmlinux                  vmlinux                  put_prev_task_fair
            2793392   2.2126  vmlinux                  vmlinux                  audit_syscall_entry
            2723949   2.1576  vmlinux                  vmlinux                  kfree
            2551007   2.0206  vmlinux                  vmlinux                  mutex_spin_on_owner
            2406364   1.9061  vmlinux                  vmlinux                  rb_insert_color
            2321179   1.8386  vmlinux                  vmlinux                  thread_return
            2184031   1.7299  vmlinux                  vmlinux                  native_read_tsc
            2002277   1.5860  vmlinux                  vmlinux                  dquot_mark_dquot_dirty
            1990135   1.5764  vmlinux                  vmlinux                  native_sched_clock
            1970544   1.5608  vmlinux                  vmlinux                  set_next_entity
            1967852   1.5587  vmlinux                  vmlinux                  pick_next_task_fair
            1966282   1.5575  vmlinux                  vmlinux                  dquot_commit
            1966271   1.5575  vmlinux                  vmlinux                  sysret_check
            1919524   1.5204  vmlinux                  vmlinux                  unroll_tree_refs
            1811281   1.4347  vmlinux                  vmlinux                  sched_clock_cpu
            1810278   1.4339  vmlinux                  vmlinux                  rb_erase
            

            The unlink rate speed increased ~28% with 32 threads, and the oprofile data shows contention on dq_list_lock in dqput() is alleviated a lot.

            I think we should take this patch as a supplement of fix LU-2442.

            niu Niu Yawei (Inactive) added a comment - I did few tests on Rosso cluster, the result is similar to what we got in LU-2442 , except that the performance drop problem (with 32 threads) showed in LU-2442 (with LU-2442 patch) is resolved: patched: mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r Path: /mnt FS: 19.7 GiB Used FS: 17.5% Inodes: 1.2 Mi Used Inodes: 4.6% 32 tasks, 800000 files SUMMARY: (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 0.000 0.000 0.000 0.000 File removal : 4042.032 1713.613 2713.827 698.243 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 2.164 1.861 2.020 0.088 -- finished at 07/07/2013 20:37:36 -- CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 10826148 11.6911 vmlinux schedule 7089656 7.6561 vmlinux update_curr 6432166 6.9461 vmlinux sys_sched_yield 4384494 4.7348 vmlinux __audit_syscall_exit 4088507 4.4152 libc-2.12.so sched_yield 3441346 3.7163 vmlinux system_call_after_swapgs 3337224 3.6038 vmlinux put_prev_task_fair 3244213 3.5034 vmlinux audit_syscall_entry 2844216 3.0715 vmlinux thread_return 2702323 2.9182 vmlinux rb_insert_color 2636798 2.8475 vmlinux native_read_tsc 2234644 2.4132 vmlinux sched_clock_cpu 2182744 2.3571 vmlinux native_sched_clock 2175482 2.3493 vmlinux hrtick_start_fair 2152807 2.3248 vmlinux pick_next_task_fair 2130024 2.3002 vmlinux set_next_entity 2099576 2.2673 vmlinux rb_erase 1790101 1.9331 vmlinux update_stats_wait_end 1777328 1.9193 vmlinux mutex_spin_on_owner 1701672 1.8376 vmlinux sysret_check unpatched: mdtest-1.8.3 was launched with 32 total task(s) on 1 nodes Command line used: mdtest -d /mnt/ldiskfs -i 10 -n 25000 -u -F -r Path: /mnt FS: 19.7 GiB Used FS: 17.8% Inodes: 1.2 Mi Used Inodes: 4.6% 32 tasks, 800000 files SUMMARY: (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 0.000 0.000 0.000 0.000 File removal : 2816.345 1673.085 2122.347 342.119 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 2.296 0.111 1.361 0.866 -- finished at 07/07/2013 21:11:03 -- CPU: Intel Sandy Bridge microarchitecture, speed 2.601e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name app name symbol name 23218790 18.3914 vmlinux vmlinux dqput 9549739 7.5643 vmlinux vmlinux __audit_syscall_exit 9116086 7.2208 vmlinux vmlinux schedule 8290558 6.5669 vmlinux vmlinux dqget 5576620 4.4172 vmlinux vmlinux update_curr 5343755 4.2327 vmlinux vmlinux sys_sched_yield 3251018 2.5751 libc-2.12.so libc-2.12.so sched_yield 2907579 2.3031 vmlinux vmlinux system_call_after_swapgs 2854863 2.2613 vmlinux vmlinux put_prev_task_fair 2793392 2.2126 vmlinux vmlinux audit_syscall_entry 2723949 2.1576 vmlinux vmlinux kfree 2551007 2.0206 vmlinux vmlinux mutex_spin_on_owner 2406364 1.9061 vmlinux vmlinux rb_insert_color 2321179 1.8386 vmlinux vmlinux thread_return 2184031 1.7299 vmlinux vmlinux native_read_tsc 2002277 1.5860 vmlinux vmlinux dquot_mark_dquot_dirty 1990135 1.5764 vmlinux vmlinux native_sched_clock 1970544 1.5608 vmlinux vmlinux set_next_entity 1967852 1.5587 vmlinux vmlinux pick_next_task_fair 1966282 1.5575 vmlinux vmlinux dquot_commit 1966271 1.5575 vmlinux vmlinux sysret_check 1919524 1.5204 vmlinux vmlinux unroll_tree_refs 1811281 1.4347 vmlinux vmlinux sched_clock_cpu 1810278 1.4339 vmlinux vmlinux rb_erase The unlink rate speed increased ~28% with 32 threads, and the oprofile data shows contention on dq_list_lock in dqput() is alleviated a lot. I think we should take this patch as a supplement of fix LU-2442 .

            Thanks a lot, Minh.

            Looks the patch improves create/rm performance overall, but there are something strange in the figure that I don't know why:

            • The stat performance is getting worse with patch (or disable quota), I don't how the quota code can affect the read-only operations. Maybe we need to collect some oprofile data to investigate this further. (for both 2.4 & patched 2.4);
            • create/rm performance drops a lot on 3 threads;
            • For the per proc create/rm with 4 threads, patched 2.4 (and disable quota) is even worse than standard 2.4, looks it's same with what Siyao discovered in LU-2442 (unlink getting worse with 32 threads when disabled quota), contention on a global semaphore is better than contention on several spin locks when the contention is heavy enough?

            BTW: why we put the create & rm data into same figure? I think they are two distinct tests, aren't they?

            niu Niu Yawei (Inactive) added a comment - Thanks a lot, Minh. Looks the patch improves create/rm performance overall, but there are something strange in the figure that I don't know why: The stat performance is getting worse with patch (or disable quota), I don't how the quota code can affect the read-only operations. Maybe we need to collect some oprofile data to investigate this further. (for both 2.4 & patched 2.4); create/rm performance drops a lot on 3 threads; For the per proc create/rm with 4 threads, patched 2.4 (and disable quota) is even worse than standard 2.4, looks it's same with what Siyao discovered in LU-2442 (unlink getting worse with 32 threads when disabled quota), contention on a global semaphore is better than contention on several spin locks when the contention is heavy enough? BTW: why we put the create & rm data into same figure? I think they are two distinct tests, aren't they?
            mdiep Minh Diep added a comment -

            performance data for the patch

            mdiep Minh Diep added a comment - performance data for the patch

            Fine with me.

            simmonsja James A Simmons added a comment - Fine with me.

            James, please submit the SLES changes as a separate patch. Since this doesn't affect the API, the two changes do not need to be in the same commit. If the other patch needs to be refreshed for some other reason they can be merged.

            adilger Andreas Dilger added a comment - James, please submit the SLES changes as a separate patch. Since this doesn't affect the API, the two changes do not need to be in the same commit. If the other patch needs to be refreshed for some other reason they can be merged.

            This patch will need to be port to SLES11 SP[1/2] as well. Later in the week I can include it in the patch.

            simmonsja James A Simmons added a comment - This patch will need to be port to SLES11 SP [1/2] as well. Later in the week I can include it in the patch.
            mdiep Minh Diep added a comment -

            yes, will do when the cluster's IB network is back online next week

            mdiep Minh Diep added a comment - yes, will do when the cluster's IB network is back online next week

            People

              niu Niu Yawei (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: