Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.4.0
-
None
-
3
-
5253
Description
Hit this in sanity 118k
[19970.051046] Lustre: DEBUG MARKER: == sanity test 118k: bio alloc -ENOMEM and IO TERM handling =========== 22:02:17 (1350525737) [20160.612156] INFO: task flush-lustre-5:18802 blocked for more than 120 seconds. [20160.612995] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20160.613824] flush-lustre- D 0000000000000000 3504 18802 2 0x00000000 [20160.614395] ffff880047a39958 0000000000000046 0000000000000000 ffff880047a39900 [20160.615163] ffff880047a399a0 0000000100000000 0000000000000020 ffff88005f620c20 [20160.615529] ffff880020730738 ffff880047a39fd8 000000000000fba8 ffff880020730738 [20160.615893] Call Trace: [20160.616057] [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0 [20160.616272] [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30 [20160.616483] [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30 [20160.616718] [<ffffffff814f9ec7>] ? down_read+0x37/0x40 [20160.616954] [<ffffffffa1527b24>] lov_lsm_addref+0x34/0x150 [lov] [20160.617186] [<ffffffffa1528043>] lov_io_init+0x73/0x160 [lov] [20160.617417] [<ffffffffa10eb4e8>] cl_io_init0+0x98/0x160 [obdclass] [20160.617647] [<ffffffffa10ee2a4>] cl_io_init+0x64/0x100 [obdclass] [20160.617869] [<ffffffffa095e9ce>] cl_sync_file_range+0x11e/0x560 [lustre] [20160.618793] [<ffffffffa0984342>] ll_writepages+0x72/0x1b0 [lustre] [20160.619027] [<ffffffff81127f44>] do_writepages+0x24/0x40 [20160.619228] [<ffffffff811a5594>] writeback_single_inode+0xe4/0x2d0 [20160.619439] [<ffffffff811a5a13>] writeback_sb_inodes+0xd3/0x190 [20160.619646] [<ffffffff811a5b4b>] writeback_inodes_wb+0x7b/0x1a0 [20160.619854] [<ffffffff811a5efb>] wb_writeback+0x28b/0x3d0 [20160.620060] [<ffffffff8107c6e2>] ? del_timer_sync+0x22/0x30 [20160.620266] [<ffffffff811a61e5>] wb_do_writeback+0x1a5/0x250 [20160.620474] [<ffffffff811a62f3>] bdi_writeback_task+0x63/0x1b0 [20160.620714] [<ffffffff8108fc27>] ? bit_waitqueue+0x17/0xd0 [20160.620928] [<ffffffff81136dc0>] ? bdi_start_fn+0x0/0x100 [20160.621172] [<ffffffff81136e46>] bdi_start_fn+0x86/0x100 [20160.621396] [<ffffffff81136dc0>] ? bdi_start_fn+0x0/0x100 [20160.621598] [<ffffffff8108fa16>] kthread+0x96/0xa0 [20160.621790] [<ffffffff8100c14a>] child_rip+0xa/0x20 [20160.621985] [<ffffffff8108f980>] ? kthread+0x0/0xa0 [20160.622191] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [20160.622411] INFO: task dd:26845 blocked for more than 120 seconds. [20160.622631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20160.622987] dd D 0000000000000007 2608 26845 1 0x00000000 [20160.623220] ffff88005bea3470 0000000000000086 0000000000000000 0000000000000286 [20160.623585] 0000000000000092 ffff88003b6a8548 000000015bea3458 ffffffff81f61808 [20160.623949] ffff880046f00b78 ffff88005bea3fd8 000000000000fba8 ffff880046f00b78 [20160.624318] Call Trace: [20160.624476] [<ffffffff81044f4e>] ? kernel_map_pages+0xfe/0x110 [20160.624700] [<ffffffff814fabbd>] rwsem_down_failed_common+0x8d/0x1d0 [20160.624978] [<ffffffffa08d1afe>] ? cfs_mem_cache_free+0xe/0x10 [libcfs] [20160.625319] [<ffffffff814fad56>] rwsem_down_read_failed+0x26/0x30 [20160.625557] [<ffffffff8127c104>] call_rwsem_down_read_failed+0x14/0x30 [20160.625774] [<ffffffff814f9ec7>] ? down_read+0x37/0x40 [20160.625992] [<ffffffffa1527b24>] lov_lsm_addref+0x34/0x150 [lov] [20160.626238] [<ffffffffa1528043>] lov_io_init+0x73/0x160 [lov] [20160.626467] [<ffffffffa10eb4e8>] cl_io_init0+0x98/0x160 [obdclass] [20160.626690] [<ffffffffa10ee2a4>] cl_io_init+0x64/0x100 [obdclass] [20160.626922] [<ffffffffa149b6d3>] osc_lru_shrink+0x4a3/0x8c0 [osc] [20160.627143] [<ffffffff8116133a>] ? cache_alloc_debugcheck_after+0x14a/0x210 [20160.627378] [<ffffffffa149bfa8>] osc_page_init+0x4b8/0xb40 [osc] [20160.627610] [<ffffffffa10de7b5>] ? cl_page_slice_add+0x55/0x140 [obdclass] [20160.627856] [<ffffffffa10e2e7b>] cl_page_find0+0x2db/0x900 [obdclass] [20160.628081] [<ffffffff8116133a>] ? cache_alloc_debugcheck_after+0x14a/0x210 [20160.628327] [<ffffffffa10e34b8>] cl_page_find_sub+0x18/0x20 [obdclass] [20160.628575] [<ffffffffa152a503>] lov_page_init_raid0+0x1a3/0x780 [lov] [20160.628806] [<ffffffffa1527f58>] lov_page_init+0x68/0xe0 [lov] [20160.629038] [<ffffffffa10e2e7b>] cl_page_find0+0x2db/0x900 [obdclass] [20160.629268] [<ffffffffa08e83e2>] ? cfs_hash_lookup+0x82/0xa0 [libcfs] [20160.629492] [<ffffffff811704f5>] ? mem_cgroup_charge_common+0xa5/0xd0 [20160.629728] [<ffffffffa10e34d1>] cl_page_find+0x11/0x20 [obdclass] [20160.629962] [<ffffffffa0985ab4>] ll_cl_init+0x154/0x5b0 [lustre] [20160.630190] [<ffffffff814faeee>] ? _spin_unlock_irq+0xe/0x20 [20160.630430] [<ffffffffa0986163>] ll_prepare_write+0x53/0x1a0 [lustre] [20160.630672] [<ffffffffa099dfbe>] ll_write_begin+0x7e/0x1a0 [lustre] [20160.630898] [<ffffffff81112d23>] generic_file_buffered_write+0x123/0x300 [20160.631130] [<ffffffff8106fea7>] ? current_fs_time+0x27/0x30 [20160.631341] [<ffffffff811147e0>] __generic_file_aio_write+0x250/0x480 [20160.631563] [<ffffffff81114a7f>] generic_file_aio_write+0x6f/0xe0 [20160.631790] [<ffffffffa09b171c>] vvp_io_write_start+0x9c/0x240 [lustre] [20160.632034] [<ffffffffa10eb26a>] cl_io_start+0x6a/0x140 [obdclass] [20160.632285] [<ffffffffa10efa54>] cl_io_loop+0xb4/0x1b0 [obdclass] [20160.632529] [<ffffffffa095e05b>] ll_file_io_generic+0x42b/0x550 [lustre] [20160.632763] [<ffffffffa095ef4c>] ll_file_aio_write+0x13c/0x2c0 [lustre] [20160.632994] [<ffffffffa095f239>] ll_file_write+0x169/0x2a0 [lustre] [20160.633222] [<ffffffff8117b2e8>] vfs_write+0xb8/0x1a0 [20160.633423] [<ffffffff8117bbb1>] sys_write+0x51/0x90 [20160.633624] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [20160.633837] INFO: task dd:26851 blocked for more than 120 seconds.
There's a bunch of other dd processes hung in there.
Jinshan, I am leaving centos6-1 node in this state for you to look when you have time. Feel free to crash it with echo c >/proc/sysrq-trigger if you feel you would benefit from using crash to further investigate it that way.
Attachments
Issue Links
- is related to
-
LU-1612 Test failure on test suite parallel-scale-nfsv3, subtest test_iorssf
-
- Resolved
-