[LU-6306] sanity-lfsck test_15c ldlm lock hung Created: 28/Feb/15  Updated: 28/Feb/20  Resolved: 28/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 17660

 Description   

When migrate metadata, the OST-object's PFID still references the old MDT-object that has been removed. The layout LFSCK will try to handle it as unmatched MDT-object (new) and OST-object (old) pairs.

Unfortunately, the layout LFSCK assistant thread hung at ldlm lock when try to

lfsck_layout  S 0000000000000000     0 10745      2 0x00000080
 ffff8800580158c0 0000000000000046 0000000000000000 ffff880058015890
 ffff880058015820 ffff88006bf360f8 00000aaa4cb7f0b6 0000000000000000
 ffff880058015840 0000000100ae5af3 ffff880037d7dab8 ffff880058015fd8
Call Trace:
 [<ffffffffa07de290>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
 [<ffffffffa07e2e7d>] ldlm_completion_ast+0x66d/0x9b0 [ptlrpc]
 [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
 [<ffffffffa07dcf06>] ldlm_cli_enqueue_fini+0x936/0xe30 [ptlrpc]
 [<ffffffffa07fbd7b>] ? ptlrpc_set_destroy+0x26b/0x450 [ptlrpc]
 [<ffffffffa07dd7c1>] ldlm_cli_enqueue+0x3c1/0x870 [ptlrpc]
 [<ffffffffa07e2810>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
 [<ffffffffa07e0f70>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
 [<ffffffffa10980e8>] osp_md_object_lock+0x188/0x210 [osp]
 [<ffffffffa0dfa2a4>] lfsck_ibits_lock+0x1e4/0x2e0 [lfsck]
 [<ffffffffa0e378d8>] lfsck_layout_check_parent+0x698/0xa40 [lfsck]
 [<ffffffffa0e33b97>] ? dt_xattr_get+0x97/0x130 [lfsck]
 [<ffffffffa0e49fc3>] lfsck_layout_assistant_handler_p1+0x683/0x19f0 [lfsck]
 [<ffffffff8152a27e>] ? thread_return+0x4e/0x7d0
 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
 [<ffffffffa0e106e6>] lfsck_assistant_engine+0x496/0x1de0 [lfsck]
 [<ffffffff8105e0d0>] ? __dequeue_entity+0x30/0x50
 [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
 [<ffffffffa0e10250>] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck]
 [<ffffffff8109e66e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e809aea6-be91-11e4-ac06-5254006e85c2.



 Comments   
Comment by Andreas Dilger [ 28/Feb/20 ]

Close old bug that hasn't been seen in a long time.

Generated at Sat Feb 10 01:59:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.