Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4841

Performance regression in master for threads more than 2

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 3
    • 13338

    Description

      After commit 586e95a5b3f7b9525d78e7efc9f2949387fc9d54 we have significant performance degradation in master. The following pictures show the regress measured on Xeon Phi:

      Attachments

        1. lu-4841.png
          lu-4841.png
          42 kB
        2. lu-4841-1.png
          lu-4841-1.png
          46 kB
        3. lu-4841-5.png
          lu-4841-5.png
          24 kB
        4. perf_2.5.png
          perf_2.5.png
          15 kB
        5. perf_master.png
          perf_master.png
          13 kB

        Activity

          [LU-4841] Performance regression in master for threads more than 2

          I didn't remove that code but just make it optional, there is a lprocfs entry to control NFS unstable pages tracking, and it's off by default.

          jay Jinshan Xiong (Inactive) added a comment - I didn't remove that code but just make it optional, there is a lprocfs entry to control NFS unstable pages tracking, and it's off by default.

          Did you, or did you not, do the following?

          Remove kernel NFS unstable pages tracking

          That was a critical part of the design. Can you explain how you fixed the memory management issues without that part of the design?

          morrone Christopher Morrone (Inactive) added a comment - Did you, or did you not, do the following? Remove kernel NFS unstable pages tracking That was a critical part of the design. Can you explain how you fixed the memory management issues without that part of the design?

          And why would it be correct to make unstable pages part of the page LRU? Those pages cannot be freed. They are still in the limbo state where the server has acknowledged them, but not necessarily yet committed them to disk. When memory needs to be freed, we don't send them again, so there is no reason to walk them as part of the LRU.

          Yes, I agree that it's more reasonable to take UNSTABLE pages out of LRU. However, if we didn't do that, we need to find the corresponding cl_page in brw_commit() callback and that will use an atomic operation. Actually the current mechanism is not that bad because usually there are only small part of pages are in UNSTABLE state, and those pages should stay at the tail LRU list; therefore, they shouldn't be scanned at all.

          One of my concerns with using UNSTABLE pages is that it will make OST to commit transactions frequently. Therefore a problematic client will send lots of SOFT_SYNC RPCs to OSTs it talks to, which will degrade system performance significantly.

          Anyway, this patch didn't remove the feature of UNSTABLE pages tracking. Users can still enable it if performance is not a problem for them.

          jay Jinshan Xiong (Inactive) added a comment - And why would it be correct to make unstable pages part of the page LRU? Those pages cannot be freed. They are still in the limbo state where the server has acknowledged them, but not necessarily yet committed them to disk. When memory needs to be freed, we don't send them again, so there is no reason to walk them as part of the LRU. Yes, I agree that it's more reasonable to take UNSTABLE pages out of LRU. However, if we didn't do that, we need to find the corresponding cl_page in brw_commit() callback and that will use an atomic operation. Actually the current mechanism is not that bad because usually there are only small part of pages are in UNSTABLE state, and those pages should stay at the tail LRU list; therefore, they shouldn't be scanned at all. One of my concerns with using UNSTABLE pages is that it will make OST to commit transactions frequently. Therefore a problematic client will send lots of SOFT_SYNC RPCs to OSTs it talks to, which will degrade system performance significantly. Anyway, this patch didn't remove the feature of UNSTABLE pages tracking. Users can still enable it if performance is not a problem for them.

          Folks, can someone explain to me the rationale behind these changes in change 10003?

          1. Remove kernel NFS unstable pages tracking because it killed performance
          2. Track unstable pages as part of LRU cache. Otherwise Lustre can use much more memory than max_cached_mb
          3. Remove obd_unstable_pages tracking to avoid using global atomic counter

          Was not the entire point of the unstable page tracking pages to use the NFS unstable pages tracking? The idea was that it gives the Linux kernel a back pressure mechanism when memory is low. max_cached_mb has absolutely no regard for the actual memory usage of a running system; it only works in its own little lustre world.

          And why would it be correct to make unstable pages part of the page LRU? Those pages cannot be freed. They are still in the limbo state where the server has acknowledged them, but not necessarily yet committed them to disk. When memory needs to be freed, we don't send them again, so there is no reason to walk them as part of the LRU.

          I am rather concerned that we have thrown away correctness in pursuit of performance.

          morrone Christopher Morrone (Inactive) added a comment - Folks, can someone explain to me the rationale behind these changes in change 10003 ? 1. Remove kernel NFS unstable pages tracking because it killed performance 2. Track unstable pages as part of LRU cache. Otherwise Lustre can use much more memory than max_cached_mb 3. Remove obd_unstable_pages tracking to avoid using global atomic counter Was not the entire point of the unstable page tracking pages to use the NFS unstable pages tracking? The idea was that it gives the Linux kernel a back pressure mechanism when memory is low. max_cached_mb has absolutely no regard for the actual memory usage of a running system; it only works in its own little lustre world. And why would it be correct to make unstable pages part of the page LRU? Those pages cannot be freed. They are still in the limbo state where the server has acknowledged them, but not necessarily yet committed them to disk. When memory needs to be freed, we don't send them again, so there is no reason to walk them as part of the LRU. I am rather concerned that we have thrown away correctness in pursuit of performance.
          yujian Jian Yu added a comment -

          Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12615
          The patch depends on the patches for LU-3274 and LU-2139 on Lustre b2_5 branch.

          yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12615 The patch depends on the patches for LU-3274 and LU-2139 on Lustre b2_5 branch.

          Patch landed to Master. Please reopen ticket if more work is needed.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed.
          jlevi Jodi Levi (Inactive) added a comment - http://review.whamcloud.com/#/c/10003/

          Then let's use patch set 5 for 2.6

          jay Jinshan Xiong (Inactive) added a comment - Then let's use patch set 5 for 2.6


          This is performance test of patch set 5.

          Writers stack:

          [<ffffffffa09ba014>] osc_enter_cache+0x48d/0x999 [osc]
          [<ffffffffa09b32f8>] osc_queue_async_io+0xa58/0x1780 [osc]
          [<ffffffffa099e1d7>] osc_page_cache_add+0x47/0x120 [osc]
          [<ffffffffa09a54fa>] osc_io_commit_async+0x10a/0x3b0 [osc]
          [<ffffffffa03242f7>] cl_io_commit_async+0x77/0x130 [obdclass]
          [<ffffffffa081d581>] lov_io_commit_async+0x271/0x4c0 [lov]
          [<ffffffffa03242f7>] cl_io_commit_async+0x77/0x130 [obdclass]
          [<ffffffffa08f38d6>] vvp_io_write_commit+0xf6/0x7e0 [lustre]
          [<ffffffffa08f40d4>] vvp_io_write_start+0x114/0x390 [lustre]
          [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass]
          [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass]
          [<ffffffffa0893be9>] ll_file_io_generic+0x3b9/0x860 [lustre]
          [<ffffffffa08942f6>] ll_file_aio_write+0xd6/0x1e0 [lustre]
          [<ffffffffa089451c>] ll_file_write+0x11c/0x230 [lustre]
          [<ffffffff810dfb9d>] vfs_write+0xac/0xf3
          [<ffffffff810dfd8c>] sys_write+0x4a/0x6e
          [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          
          dmiter Dmitry Eremin (Inactive) added a comment - This is performance test of patch set 5. Writers stack: [<ffffffffa09ba014>] osc_enter_cache+0x48d/0x999 [osc] [<ffffffffa09b32f8>] osc_queue_async_io+0xa58/0x1780 [osc] [<ffffffffa099e1d7>] osc_page_cache_add+0x47/0x120 [osc] [<ffffffffa09a54fa>] osc_io_commit_async+0x10a/0x3b0 [osc] [<ffffffffa03242f7>] cl_io_commit_async+0x77/0x130 [obdclass] [<ffffffffa081d581>] lov_io_commit_async+0x271/0x4c0 [lov] [<ffffffffa03242f7>] cl_io_commit_async+0x77/0x130 [obdclass] [<ffffffffa08f38d6>] vvp_io_write_commit+0xf6/0x7e0 [lustre] [<ffffffffa08f40d4>] vvp_io_write_start+0x114/0x390 [lustre] [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass] [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0893be9>] ll_file_io_generic+0x3b9/0x860 [lustre] [<ffffffffa08942f6>] ll_file_aio_write+0xd6/0x1e0 [lustre] [<ffffffffa089451c>] ll_file_write+0x11c/0x230 [lustre] [<ffffffff810dfb9d>] vfs_write+0xac/0xf3 [<ffffffff810dfd8c>] sys_write+0x4a/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          dmiter Dmitry Eremin (Inactive) added a comment - - edited

          For pure master client and client with patch set 3 the stacks are following:

          Writers stack:

          [<ffffffff810a86dd>] balance_dirty_pages_ratelimited_nr+0x329/0x3a4
          [<ffffffff810a2dc1>] generic_file_buffered_write+0x975/0xa9e
          [<ffffffff810a3129>] __generic_file_aio_write+0x23f/0x270
          [<ffffffff810a31b2>] generic_file_aio_write+0x58/0xaa
          [<ffffffffa08f710d>] vvp_io_write_start+0xfd/0x390 [lustre]
          [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass]
          [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass]
          [<ffffffffa0896be9>] ll_file_io_generic+0x3b9/0x860 [lustre]
          [<ffffffffa08972f6>] ll_file_aio_write+0xd6/0x1e0 [lustre]
          [<ffffffffa089751c>] ll_file_write+0x11c/0x230 [lustre]
          [<ffffffff810dfb9d>] vfs_write+0xac/0xf3
          [<ffffffff810dfd8c>] sys_write+0x4a/0x6e
          [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          

          Readers stack:

          [<ffffffff8109f851>] sync_page+0x4d/0x51
          [<ffffffff810a012f>] sync_page_killable+0xe/0x3b
          [<ffffffff8109f788>] __lock_page_killable+0x66/0x68
          [<ffffffff810a1c2f>] generic_file_aio_read+0x657/0x864
          [<ffffffffa08f5216>] vvp_io_read_start+0x2f6/0x460 [lustre]
          [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass]
          [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass]
          [<ffffffffa0896be9>] ll_file_io_generic+0x3b9/0x860 [lustre]
          [<ffffffffa0897703>] ll_file_aio_read+0xd3/0x1e0 [lustre]
          [<ffffffffa089792c>] ll_file_read+0x11c/0x230 [lustre]
          [<ffffffff810dfc8d>] vfs_read+0xa9/0xf0
          [<ffffffff810dfd1e>] sys_read+0x4a/0x6e
          [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          
          dmiter Dmitry Eremin (Inactive) added a comment - - edited For pure master client and client with patch set 3 the stacks are following: Writers stack: [<ffffffff810a86dd>] balance_dirty_pages_ratelimited_nr+0x329/0x3a4 [<ffffffff810a2dc1>] generic_file_buffered_write+0x975/0xa9e [<ffffffff810a3129>] __generic_file_aio_write+0x23f/0x270 [<ffffffff810a31b2>] generic_file_aio_write+0x58/0xaa [<ffffffffa08f710d>] vvp_io_write_start+0xfd/0x390 [lustre] [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass] [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0896be9>] ll_file_io_generic+0x3b9/0x860 [lustre] [<ffffffffa08972f6>] ll_file_aio_write+0xd6/0x1e0 [lustre] [<ffffffffa089751c>] ll_file_write+0x11c/0x230 [lustre] [<ffffffff810dfb9d>] vfs_write+0xac/0xf3 [<ffffffff810dfd8c>] sys_write+0x4a/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b Readers stack: [<ffffffff8109f851>] sync_page+0x4d/0x51 [<ffffffff810a012f>] sync_page_killable+0xe/0x3b [<ffffffff8109f788>] __lock_page_killable+0x66/0x68 [<ffffffff810a1c2f>] generic_file_aio_read+0x657/0x864 [<ffffffffa08f5216>] vvp_io_read_start+0x2f6/0x460 [lustre] [<ffffffffa03255c5>] cl_io_start+0x65/0x130 [obdclass] [<ffffffffa0328a74>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa0896be9>] ll_file_io_generic+0x3b9/0x860 [lustre] [<ffffffffa0897703>] ll_file_aio_read+0xd3/0x1e0 [lustre] [<ffffffffa089792c>] ll_file_read+0x11c/0x230 [lustre] [<ffffffff810dfc8d>] vfs_read+0xa9/0xf0 [<ffffffff810dfd1e>] sys_read+0x4a/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b


          This is comparison of patch set 2 and patch set 3. Also catching usual sleep stacks during test of patch set 2.

          Writers stack:

          [<ffffffffa09cce44>] osc_enter_cache+0x48d/0x999 [osc]
          [<ffffffffa09c6138>] osc_queue_async_io+0xa58/0x17b0 [osc]
          [<ffffffffa09b10b7>] osc_page_cache_add+0x47/0x120 [osc]
          [<ffffffffa09b82ea>] osc_io_commit_async+0x10a/0x3b0 [osc]
          [<ffffffffa0450b17>] cl_io_commit_async+0x77/0x130 [obdclass]
          [<ffffffffa082f581>] lov_io_commit_async+0x271/0x4c0 [lov]
          [<ffffffffa0450b17>] cl_io_commit_async+0x77/0x130 [obdclass]
          [<ffffffffa0905796>] vvp_io_write_commit+0xf6/0x7e0 [lustre]
          [<ffffffffa0905fb6>] vvp_io_write_start+0x136/0x370 [lustre]
          [<ffffffffa0451de5>] cl_io_start+0x65/0x130 [obdclass]
          [<ffffffffa0455294>] cl_io_loop+0xb4/0x1b0 [obdclass]
          [<ffffffffa08ab1d8>] ll_file_io_generic+0x238/0x6d0 [lustre]
          [<ffffffffa08ab8f5>] ll_file_aio_write+0xe5/0x1f0 [lustre]
          [<ffffffffa08abb30>] ll_file_write+0x130/0x240 [lustre]
          [<ffffffff810dfb9d>] vfs_write+0xac/0xf3
          [<ffffffff810dfd8c>] sys_write+0x4a/0x6e
          [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          

          Readers stack:

          [<ffffffff8109f851>] sync_page+0x4d/0x51
          [<ffffffff810a012f>] sync_page_killable+0xe/0x3b
          [<ffffffff8109f788>] __lock_page_killable+0x66/0x68
          [<ffffffff810a1c2f>] generic_file_aio_read+0x657/0x864
          [<ffffffffa0904066>] vvp_io_read_start+0x2f6/0x460 [lustre]
          [<ffffffffa0451de5>] cl_io_start+0x65/0x130 [obdclass]
          [<ffffffffa0455294>] cl_io_loop+0xb4/0x1b0 [obdclass]
          [<ffffffffa08ab1d8>] ll_file_io_generic+0x238/0x6d0 [lustre]
          [<ffffffffa08abd22>] ll_file_aio_read+0xe2/0x1f0 [lustre]
          [<ffffffffa08abf60>] ll_file_read+0x130/0x240 [lustre]
          [<ffffffff810dfc8d>] vfs_read+0xa9/0xf0
          [<ffffffff810dfd1e>] sys_read+0x4a/0x6e
          [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b
          
          dmiter Dmitry Eremin (Inactive) added a comment - This is comparison of patch set 2 and patch set 3. Also catching usual sleep stacks during test of patch set 2. Writers stack: [<ffffffffa09cce44>] osc_enter_cache+0x48d/0x999 [osc] [<ffffffffa09c6138>] osc_queue_async_io+0xa58/0x17b0 [osc] [<ffffffffa09b10b7>] osc_page_cache_add+0x47/0x120 [osc] [<ffffffffa09b82ea>] osc_io_commit_async+0x10a/0x3b0 [osc] [<ffffffffa0450b17>] cl_io_commit_async+0x77/0x130 [obdclass] [<ffffffffa082f581>] lov_io_commit_async+0x271/0x4c0 [lov] [<ffffffffa0450b17>] cl_io_commit_async+0x77/0x130 [obdclass] [<ffffffffa0905796>] vvp_io_write_commit+0xf6/0x7e0 [lustre] [<ffffffffa0905fb6>] vvp_io_write_start+0x136/0x370 [lustre] [<ffffffffa0451de5>] cl_io_start+0x65/0x130 [obdclass] [<ffffffffa0455294>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa08ab1d8>] ll_file_io_generic+0x238/0x6d0 [lustre] [<ffffffffa08ab8f5>] ll_file_aio_write+0xe5/0x1f0 [lustre] [<ffffffffa08abb30>] ll_file_write+0x130/0x240 [lustre] [<ffffffff810dfb9d>] vfs_write+0xac/0xf3 [<ffffffff810dfd8c>] sys_write+0x4a/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b Readers stack: [<ffffffff8109f851>] sync_page+0x4d/0x51 [<ffffffff810a012f>] sync_page_killable+0xe/0x3b [<ffffffff8109f788>] __lock_page_killable+0x66/0x68 [<ffffffff810a1c2f>] generic_file_aio_read+0x657/0x864 [<ffffffffa0904066>] vvp_io_read_start+0x2f6/0x460 [lustre] [<ffffffffa0451de5>] cl_io_start+0x65/0x130 [obdclass] [<ffffffffa0455294>] cl_io_loop+0xb4/0x1b0 [obdclass] [<ffffffffa08ab1d8>] ll_file_io_generic+0x238/0x6d0 [lustre] [<ffffffffa08abd22>] ll_file_aio_read+0xe2/0x1f0 [lustre] [<ffffffffa08abf60>] ll_file_read+0x130/0x240 [lustre] [<ffffffff810dfc8d>] vfs_read+0xa9/0xf0 [<ffffffff810dfd1e>] sys_read+0x4a/0x6e [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b

          People

            jay Jinshan Xiong (Inactive)
            dmiter Dmitry Eremin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: