[LU-11465] OSS/MDS deadlock in 2.10.5 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Incomplete
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.10.4, Lustre 2.10.5, Lustre 2.10.6
Labels:
None
Environment:
CentOS 7, 3.10.0-862.2.3.el7_lustre.x86_64, 1 MDS (+1 HA pair), 4 OSS

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

After an upgrade of Lustre servers from Centos 6 and IEEL/DDN version of Lustre 2.5 to Centos 7 and Lustre 2.10.5 we are experiencing stability issues related to and MDS/OSS deadlock, most likely caused by hung OST thread. The issue is visible on one of two filesystems, the smaller one, both are working on identical HW and SW stack, so the issue is most likely related to a specific workload on the filesystem. This is mostly an midterm storage filesystem, so more metadata ops happen on it than on the scratch one. We initially thought that the issue might be caused by our try to enable project quotas (tune2fs -O project on all targets), so we turned it off (tune2fs -O ^project), which didn't change a thing. Also all combinations of MDT/OST quota enforcement (initally it was 'g', than 'gp', now 'none') doesn't change a thing, so probably it is not related to the problem. In terms of relevant information, we run on sync_journal=1 as a workaround to a problem with memory accounting on the clients suggested by DDN some time ago. The crash happens at least once per few hours, sometimes sequentially one after another, on different OSS nodes. I also tagged 2.10.6 as affected, as we tried everything from 2.10.4 to current b2_10.

On OSS we get:

[Tue Oct  2 11:56:44 2018] LNet: Service thread pid 11401 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[Tue Oct  2 11:56:44 2018] Pid: 11401, comm: ll_ost_out00_00 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:44 2018] Call Trace:
[Tue Oct  2 11:56:44 2018]  [<ffffffff8395aa77>] call_rwsem_down_write_failed+0x17/0x30
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0eeaa8c>] osd_write_lock+0x5c/0xe0 [osd_ldiskfs]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b5b119>] out_tx_attr_set_exec+0x69/0x3f0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b55591>] out_tx_end+0xe1/0x5c0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b596d3>] out_handle+0x1453/0x1bc0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:44 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:44 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.11401
[Tue Oct  2 11:56:44 2018] Pid: 35664, comm: ll_ost_io00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:44 2018] Call Trace:
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:45 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:45 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:45 2018] Pid: 35678, comm: ll_ost_io00_112 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:45 2018] Call Trace:
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0e7e495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0e76533>] jbd2_journal_stop+0x343/0x3d0 [jbd2]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc112ab4c>] __ldiskfs_journal_stop+0x3c/0xb0 [ldiskfs]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0efb783>] osd_trans_stop+0x183/0x850 [osd_ldiskfs]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc107d352>] ofd_trans_stop+0x22/0x60 [ofd]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc10835f4>] ofd_commitrw_write+0x7e4/0x1c90 [ofd]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc10877a9>] ofd_commitrw+0x4c9/0xae0 [ofd]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0b80864>] obd_commitrw+0x2f3/0x336 [ptlrpc]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0b5338d>] tgt_brw_write+0xffd/0x17d0 [ptlrpc]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0b4f38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0af7e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:45 2018]  [<ffffffffc0afb592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:45 2018]  [<ffffffff836bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:45 2018]  [<ffffffff83d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:45 2018]  [<ffffffffffffffff>] 0xffffffffffffffff

On MDS:

[Tue Oct  2 11:56:44 2018] LNet: Service thread pid 69588 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[Tue Oct  2 11:56:44 2018] Pid: 69588, comm: mdt00_095 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:44 2018] Call Trace:
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0f7e140>] ptlrpc_set_wait+0x4c0/0x920 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0f7e61d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc14de463>] osp_remote_sync+0xd3/0x200 [osp]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc14c6dd0>] osp_attr_set+0x4c0/0x5d0 [osp]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1421c6b>] lod_sub_attr_set+0x1cb/0x460 [lod]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1403be6>] lod_obj_stripe_attr_set_cb+0x16/0x30 [lod]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc140fa96>] lod_obj_for_each_stripe+0xb6/0x230 [lod]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1411043>] lod_attr_set+0x2f3/0x9a0 [lod]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc14810a0>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1481e8d>] mdd_attr_set+0x8ad/0xce0 [mdd]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc13645f5>] mdt_reint_setattr+0xba5/0x1060 [mdt]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc1351f07>] mdt_reint+0x67/0x140 [mdt]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:44 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:44 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:44 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:44 2018] LustreError: dumping log to /tmp/lustre-log.1538474204.69588
[Tue Oct  2 11:56:49 2018] LNet: Service thread pid 7634 was inactive for 200.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[Tue Oct  2 11:56:49 2018] Pid: 7634, comm: mdt_rdpg00_000 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:49 2018] Call Trace:
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1481b69>] mdd_attr_set+0x589/0xce0 [mdd]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136c1d6>] mdt_mfd_close+0x1a6/0x610 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1371951>] mdt_close_internal+0x121/0x220 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1371c70>] mdt_close+0x220/0x780 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:49 2018] LustreError: dumping log to /tmp/lustre-log.1538474210.7634
[Tue Oct  2 11:56:49 2018] Pid: 69508, comm: mdt00_067 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:49 2018] Call Trace:
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:49 2018] Pid: 69607, comm: mdt00_105 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:49 2018] Call Trace:
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1369de7>] mdt_empty_transno+0xf7/0x840 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136cf1e>] mdt_mfd_open+0x8de/0xe70 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136da2b>] mdt_finish_open+0x57b/0x690 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136f308>] mdt_reint_open+0x17c8/0x3190 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff
[Tue Oct  2 11:56:49 2018] Pid: 69594, comm: mdt00_098 3.10.0-862.2.3.el7_lustre.x86_64 #1 SMP Tue May 22 17:36:23 UTC 2018
[Tue Oct  2 11:56:49 2018] Call Trace:
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef085>] wait_transaction_locked+0x85/0xd0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef368>] add_transaction_credits+0x268/0x2f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06ef5e1>] start_this_handle+0x1a1/0x430 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc06efa93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc118aa99>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc12cac8e>] osd_trans_start+0x1ae/0x460 [osd_ldiskfs]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1002512>] top_trans_start+0x702/0x940 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc13ec3f1>] lod_trans_start+0x31/0x40 [lod]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc148c1ba>] mdd_trans_start+0x1a/0x20 [mdd]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc14763c0>] mdd_create+0xbe0/0x1400 [mdd]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc136fcb5>] mdt_reint_open+0x2175/0x3190 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1364b33>] mdt_reint_rec+0x83/0x210 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc134636b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1346892>] mdt_intent_reint+0x162/0x430 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc1351671>] mdt_intent_policy+0x441/0xc70 [mdt]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f3b2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f64b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fea452>] tgt_enqueue+0x62/0x210 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0fee38a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f96e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffffc0f9a592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[Tue Oct  2 11:56:49 2018]  [<ffffffff946bae31>] kthread+0xd1/0xe0
[Tue Oct  2 11:56:49 2018]  [<ffffffff94d1f5f7>] ret_from_fork_nospec_end+0x0/0x39
[Tue Oct  2 11:56:49 2018]  [<ffffffffffffffff>] 0xffffffffffffffff

Please have a look at above, full dmesg and lustre log files are attached as an archive.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

logs.zip
15.44 MB
03/Oct/18 5:06 PM
cyf_dec10_server_stacks.log
117 kB
10/Dec/18 1:41 PM
cyf_dec10_client_stacks.log
28 kB
10/Dec/18 1:41 PM

Issue Links

is related to

LU-11613 MDS and OSS locked up wait_transaction_locked+0x85/0xd0 [jbd2]

Resolved

is related to

LU-10048 osd-ldiskfs to truncate outside of main transaction

Resolved

LU-5152 Can't enforce block quota when unprivileged user change group

Resolved

OSS/MDS deadlock in 2.10.5

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates