Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • None
    • 3
    • 13790

    Description

      I can occasionally see this issue in machines with less memory. The deadlock has the following call stack:

      dd            D 0000000000000000     0  2158      1 0x00000004
       ffff88010ecc10f8 0000000000000086 ffff8801ffffffff 0000000042a8c635
       ffff88010ecc1078 ffff88009ccb68a0 0000000000047e6a ffffffffaca103a3
       ffff8800d7bd5058 ffff88010ecc1fd8 000000000000fb88 ffff8800d7bd5058
      Call Trace:
       [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
       [<ffffffff81119e10>] ? sync_page+0x0/0x50
       [<ffffffff8150ed93>] io_schedule+0x73/0xc0
       [<ffffffff81119e4d>] sync_page+0x3d/0x50
       [<ffffffff8150f5fa>] __wait_on_bit_lock+0x5a/0xc0
       [<ffffffff81119de7>] __lock_page+0x67/0x70
       [<ffffffff81096de0>] ? wake_bit_function+0x0/0x50
       [<ffffffffa0f60101>] vvp_page_make_ready+0x271/0x280 [lustre]
       [<ffffffffa0542999>] cl_page_make_ready+0x89/0x370 [obdclass]
       [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa0a323b7>] osc_extent_make_ready+0x3b7/0xe50 [osc]
       [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
       [<ffffffffa0a36af6>] osc_io_unplug0+0x1736/0x2130 [osc]
       [<ffffffff8103c7d8>] ? pvclock_clocksource_read+0x58/0xd0
       [<ffffffffa03b45a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa0a39681>] osc_io_unplug+0x11/0x20 [osc]
       [<ffffffffa0a3bc86>] osc_cache_writeback_range+0xdb6/0x1290 [osc]
       [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs]
       [<ffffffffa03b9d47>] ? cfs_hash_bd_lookup_intent+0x37/0x130 [libcfs]
       [<ffffffffa03b9362>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs]
       [<ffffffffa054a45d>] ? cl_io_sub_init+0x5d/0xc0 [obdclass]
       [<ffffffffa0a29fd0>] osc_io_fsync_start+0x90/0x360 [osc]
       [<ffffffffa0547640>] ? cl_io_start+0x0/0x140 [obdclass]
       [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
       [<ffffffffa0a8f18e>] lov_io_call+0x8e/0x130 [lov]
       [<ffffffffa0a9324c>] lov_io_start+0x10c/0x180 [lov]
       [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
       [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass]
       [<ffffffffa0f02acb>] cl_sync_file_range+0x31b/0x500 [lustre]
       [<ffffffffa0f2fe7c>] ll_writepages+0x9c/0x220 [lustre]
       [<ffffffff8112e1b1>] do_writepages+0x21/0x40
       [<ffffffff811aca9d>] writeback_single_inode+0xdd/0x290
       [<ffffffff811aceae>] writeback_sb_inodes+0xce/0x180
       [<ffffffff811ad00b>] writeback_inodes_wb+0xab/0x1b0
       [<ffffffff8112d60d>] balance_dirty_pages+0x23d/0x4d0
       [<ffffffffa0541768>] ? cl_page_invoid+0x68/0x160 [obdclass]
       [<ffffffff8112d904>] balance_dirty_pages_ratelimited_nr+0x64/0x70
       [<ffffffff8111a86a>] generic_file_buffered_write+0x1da/0x2e0
       [<ffffffff81075887>] ? current_fs_time+0x27/0x30
       [<ffffffff8111c210>] __generic_file_aio_write+0x260/0x490
       [<ffffffffa0a93d9c>] ? lov_lock_enqueue+0xbc/0x170 [lov]
       [<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100
       [<ffffffffa0f634a2>] vvp_io_write_start+0x102/0x3f0 [lustre]
       [<ffffffffa05476aa>] cl_io_start+0x6a/0x140 [obdclass]
       [<ffffffffa054aea4>] cl_io_loop+0xb4/0x1b0 [obdclass]
       [<ffffffffa0f00297>] ll_file_io_generic+0x407/0x8d0 [lustre]
       [<ffffffffa05406c9>] ? cl_env_get+0x29/0x350 [obdclass]
       [<ffffffffa0f00fa3>] ll_file_aio_write+0x133/0x2b0 [lustre]
       [<ffffffffa0f01279>] ll_file_write+0x159/0x290 [lustre]
       [<ffffffff81181398>] vfs_write+0xb8/0x1a0
       [<ffffffff81181c91>] sys_write+0x51/0x90
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      In balance_dirty_pages(), it tries to write back some dirty pages between after write_end(). However, ll_write_end() can hold the page to add it into commit queue and causes the problem.

      We can fix the problem by releasing the page in ll_write_end() if the page is already dirty.

      Patch is coming.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: