Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2208

deadlock in add_lsmref

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 5253

    Description

      Hit this in sanity 118k

      [19970.051046] Lustre: DEBUG MARKER: == sanity test 118k: bio alloc -ENOMEM and IO TERM handling =========== 22:02:17 (1350525737)
      [20160.612156] INFO: task flush-lustre-5:18802 blocked for more than 120 seconds.
      [20160.612995] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [20160.613824] flush-lustre- D 0000000000000000  3504 18802      2 0x00000000
      [20160.614395]  ffff880047a39958 0000000000000046 0000000000000000 ffff880047a39900
      [20160.615163]  ffff880047a399a0 0000000100000000 0000000000000020 ffff88005f620c20
      [20160.615529]  ffff880020730738 ffff880047a39fd8 000000000000fba8 ffff880020730738
      [20160.615893] Call Trace:
      [20160.616057]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
      [20160.616272]  [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30
      [20160.616483]  [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30
      [20160.616718]  [<ffffffff814f9ec7>] ? down_read+0x37/0x40
      [20160.616954]  [<ffffffffa1527b24>] lov_lsm_addref+0x34/0x150 [lov]
      [20160.617186]  [<ffffffffa1528043>] lov_io_init+0x73/0x160 [lov]
      [20160.617417]  [<ffffffffa10eb4e8>] cl_io_init0+0x98/0x160 [obdclass]
      [20160.617647]  [<ffffffffa10ee2a4>] cl_io_init+0x64/0x100 [obdclass]
      [20160.617869]  [<ffffffffa095e9ce>] cl_sync_file_range+0x11e/0x560 [lustre]
      [20160.618793]  [<ffffffffa0984342>] ll_writepages+0x72/0x1b0 [lustre]
      [20160.619027]  [<ffffffff81127f44>] do_writepages+0x24/0x40
      [20160.619228]  [<ffffffff811a5594>] writeback_single_inode+0xe4/0x2d0
      [20160.619439]  [<ffffffff811a5a13>] writeback_sb_inodes+0xd3/0x190
      [20160.619646]  [<ffffffff811a5b4b>] writeback_inodes_wb+0x7b/0x1a0
      [20160.619854]  [<ffffffff811a5efb>] wb_writeback+0x28b/0x3d0
      [20160.620060]  [<ffffffff8107c6e2>] ? del_timer_sync+0x22/0x30
      [20160.620266]  [<ffffffff811a61e5>] wb_do_writeback+0x1a5/0x250
      [20160.620474]  [<ffffffff811a62f3>] bdi_writeback_task+0x63/0x1b0
      [20160.620714]  [<ffffffff8108fc27>] ? bit_waitqueue+0x17/0xd0
      [20160.620928]  [<ffffffff81136dc0>] ? bdi_start_fn+0x0/0x100
      [20160.621172]  [<ffffffff81136e46>] bdi_start_fn+0x86/0x100
      [20160.621396]  [<ffffffff81136dc0>] ? bdi_start_fn+0x0/0x100
      [20160.621598]  [<ffffffff8108fa16>] kthread+0x96/0xa0
      [20160.621790]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [20160.621985]  [<ffffffff8108f980>] ? kthread+0x0/0xa0
      [20160.622191]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [20160.622411] INFO: task dd:26845 blocked for more than 120 seconds.
      [20160.622631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [20160.622987] dd            D 0000000000000007  2608 26845      1 0x00000000
      [20160.623220]  ffff88005bea3470 0000000000000086 0000000000000000 0000000000000286
      [20160.623585]  0000000000000092 ffff88003b6a8548 000000015bea3458 ffffffff81f61808
      [20160.623949]  ffff880046f00b78 ffff88005bea3fd8 000000000000fba8 ffff880046f00b78
      [20160.624318] Call Trace:
      [20160.624476]  [<ffffffff81044f4e>] ? kernel_map_pages+0xfe/0x110
      [20160.624700]  [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0
      [20160.624978]  [<ffffffffa08d1afe>] ? cfs_mem_cache_free+0xe/0x10 [libcfs]
      [20160.625319]  [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30
      [20160.625557]  [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30
      [20160.625774]  [<ffffffff814f9ec7>] ? down_read+0x37/0x40
      [20160.625992]  [<ffffffffa1527b24>] lov_lsm_addref+0x34/0x150 [lov]
      [20160.626238]  [<ffffffffa1528043>] lov_io_init+0x73/0x160 [lov]
      [20160.626467]  [<ffffffffa10eb4e8>] cl_io_init0+0x98/0x160 [obdclass]
      [20160.626690]  [<ffffffffa10ee2a4>] cl_io_init+0x64/0x100 [obdclass]
      [20160.626922]  [<ffffffffa149b6d3>] osc_lru_shrink+0x4a3/0x8c0 [osc]
      [20160.627143]  [<ffffffff8116133a>] ? cache_alloc_debugcheck_after+0x14a/0x210
      [20160.627378]  [<ffffffffa149bfa8>] osc_page_init+0x4b8/0xb40 [osc]
      [20160.627610]  [<ffffffffa10de7b5>] ? cl_page_slice_add+0x55/0x140 [obdclass]
      [20160.627856]  [<ffffffffa10e2e7b>] cl_page_find0+0x2db/0x900 [obdclass]
      [20160.628081]  [<ffffffff8116133a>] ? cache_alloc_debugcheck_after+0x14a/0x210
      [20160.628327]  [<ffffffffa10e34b8>] cl_page_find_sub+0x18/0x20 [obdclass]
      [20160.628575]  [<ffffffffa152a503>] lov_page_init_raid0+0x1a3/0x780 [lov]
      [20160.628806]  [<ffffffffa1527f58>] lov_page_init+0x68/0xe0 [lov]
      [20160.629038]  [<ffffffffa10e2e7b>] cl_page_find0+0x2db/0x900 [obdclass]
      [20160.629268]  [<ffffffffa08e83e2>] ? cfs_hash_lookup+0x82/0xa0 [libcfs]
      [20160.629492]  [<ffffffff811704f5>] ? mem_cgroup_charge_common+0xa5/0xd0
      [20160.629728]  [<ffffffffa10e34d1>] cl_page_find+0x11/0x20 [obdclass]
      [20160.629962]  [<ffffffffa0985ab4>] ll_cl_init+0x154/0x5b0 [lustre]
      [20160.630190]  [<ffffffff814faeee>] ? _spin_unlock_irq+0xe/0x20
      [20160.630430]  [<ffffffffa0986163>] ll_prepare_write+0x53/0x1a0 [lustre]
      [20160.630672]  [<ffffffffa099dfbe>] ll_write_begin+0x7e/0x1a0 [lustre]
      [20160.630898]  [<ffffffff81112d23>] generic_file_buffered_write+0x123/0x300
      [20160.631130]  [<ffffffff8106fea7>] ? current_fs_time+0x27/0x30
      [20160.631341]  [<ffffffff811147e0>] __generic_file_aio_write+0x250/0x480
      [20160.631563]  [<ffffffff81114a7f>] generic_file_aio_write+0x6f/0xe0
      [20160.631790]  [<ffffffffa09b171c>] vvp_io_write_start+0x9c/0x240 [lustre]
      [20160.632034]  [<ffffffffa10eb26a>] cl_io_start+0x6a/0x140 [obdclass]
      [20160.632285]  [<ffffffffa10efa54>] cl_io_loop+0xb4/0x1b0 [obdclass]
      [20160.632529]  [<ffffffffa095e05b>] ll_file_io_generic+0x42b/0x550 [lustre]
      [20160.632763]  [<ffffffffa095ef4c>] ll_file_aio_write+0x13c/0x2c0 [lustre]
      [20160.632994]  [<ffffffffa095f239>] ll_file_write+0x169/0x2a0 [lustre]
      [20160.633222]  [<ffffffff8117b2e8>] vfs_write+0xb8/0x1a0
      [20160.633423]  [<ffffffff8117bbb1>] sys_write+0x51/0x90
      [20160.633624]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      [20160.633837] INFO: task dd:26851 blocked for more than 120 seconds.
      

      There's a bunch of other dd processes hung in there.
      Jinshan, I am leaving centos6-1 node in this state for you to look when you have time. Feel free to crash it with echo c >/proc/sysrq-trigger if you feel you would benefit from using crash to further investigate it that way.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: