Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6306

sanity-lfsck test_15c ldlm lock hung

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 17660

    Description

      When migrate metadata, the OST-object's PFID still references the old MDT-object that has been removed. The layout LFSCK will try to handle it as unmatched MDT-object (new) and OST-object (old) pairs.

      Unfortunately, the layout LFSCK assistant thread hung at ldlm lock when try to

      lfsck_layout  S 0000000000000000     0 10745      2 0x00000080
       ffff8800580158c0 0000000000000046 0000000000000000 ffff880058015890
       ffff880058015820 ffff88006bf360f8 00000aaa4cb7f0b6 0000000000000000
       ffff880058015840 0000000100ae5af3 ffff880037d7dab8 ffff880058015fd8
      Call Trace:
       [<ffffffffa07de290>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
       [<ffffffffa07e2e7d>] ldlm_completion_ast+0x66d/0x9b0 [ptlrpc]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa07dcf06>] ldlm_cli_enqueue_fini+0x936/0xe30 [ptlrpc]
       [<ffffffffa07fbd7b>] ? ptlrpc_set_destroy+0x26b/0x450 [ptlrpc]
       [<ffffffffa07dd7c1>] ldlm_cli_enqueue+0x3c1/0x870 [ptlrpc]
       [<ffffffffa07e2810>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa07e0f70>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
       [<ffffffffa10980e8>] osp_md_object_lock+0x188/0x210 [osp]
       [<ffffffffa0dfa2a4>] lfsck_ibits_lock+0x1e4/0x2e0 [lfsck]
       [<ffffffffa0e378d8>] lfsck_layout_check_parent+0x698/0xa40 [lfsck]
       [<ffffffffa0e33b97>] ? dt_xattr_get+0x97/0x130 [lfsck]
       [<ffffffffa0e49fc3>] lfsck_layout_assistant_handler_p1+0x683/0x19f0 [lfsck]
       [<ffffffff8152a27e>] ? thread_return+0x4e/0x7d0
       [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
       [<ffffffffa0e106e6>] lfsck_assistant_engine+0x496/0x1de0 [lfsck]
       [<ffffffff8105e0d0>] ? __dequeue_entity+0x30/0x50
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa0e10250>] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      This issue was created by maloo for nasf <fan.yong@intel.com>

      Please provide additional information about the failure here.

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e809aea6-be91-11e4-ac06-5254006e85c2.

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: