Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.12.2
    • None
    • 2
    • 9223372036854775807

    Description

      MDS deadlocked

      Similar to LU-13073

       12287155.058187] LNet: Service thread pid 15312 was inactive for 550.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [12287155.109703] LNet: Skipped 2 previous similar messages
      [12287155.125609]  [<ffffffffa5f87398>] call_rwsem_down_read_failed+0x18/0x30
      [12287155.130583]  [<ffffffffc144acfc>] osd_read_lock+0x5c/0xe0 [osd_ldiskfs]
      [12287155.130612]  [<ffffffffc16f28ea>] lod_read_lock+0x3a/0xd0 [lod]
      [12287155.130625]  [<ffffffffc17779aa>] mdd_read_lock+0x3a/0xd0 [mdd]
      [12287155.130632]  [<ffffffffc177d730>] mdd_xattr_get+0x70/0x5c0 [mdd]
      [12287155.130648]  [<ffffffffc15e6ea6>] mdt_stripe_get+0xd6/0x400 [mdt]
      [12287155.130657]  [<ffffffffc15e7a2d>] mdt_attr_get_complex+0x46d/0x850 [mdt]
      [12287155.130665]  [<ffffffffc15e800c>] mdt_getattr_internal+0x1fc/0xf60 [mdt]
      [12287155.130673]  [<ffffffffc15ebd60>] mdt_getattr_name_lock+0x950/0x1c30 [mdt]
      [12287155.130681]  [<ffffffffc15f3c05>] mdt_intent_getattr+0x2b5/0x480 [mdt]
      [12287155.130691]  [<ffffffffc15f0a18>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [12287155.130736]  [<ffffffffc0f2dd26>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [12287155.130769]  [<ffffffffc0f56587>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [12287155.130815]  [<ffffffffc0fde882>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [12287155.130853]  [<ffffffffc0fe31da>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [12287155.130887]  [<ffffffffc0f8880b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [12287155.130921]  [<ffffffffc0f8c13c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
      [12287155.130925]  [<ffffffffa5cc1da1>] kthread+0xd1/0xe0
      [12287155.130929]  [<ffffffffa6375c37>] ret_from_fork_nospec_end+0x0/0x39
      [12287155.130947]  [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-13462] MDS deadlocks in osd_read_lock()
            ys Yang Sheng added a comment -

            Hi, Mahmoud,

            The log you attached really duplicated with LU-13073. But it is different with which you pasted stackstrace. You pasted log shows thread stuck on osd_read_lock. This most was caused by some local filesystem issue. But the LU-13073 is not. It is a long outstanding issue caused by OSS.

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Hi, Mahmoud, The log you attached really duplicated with LU-13073 . But it is different with which you pasted stackstrace. You pasted log shows thread stuck on osd_read_lock. This most was caused by some local filesystem issue. But the LU-13073 is not. It is a long outstanding issue caused by OSS. Thanks, YangSheng

            Attached the stack trace.

            mhanafi Mahmoud Hanafi added a comment - Attached the stack trace.
            ys Yang Sheng added a comment -

            Then have any possible to provide sysrq-t info? From stack trace i don't think it same as lu-13073.

            ys Yang Sheng added a comment - Then have any possible to provide sysrq-t info? From stack trace i don't think it same as lu-13073.

            The stack trace for hung threads is the same as LU-13073. But in our case we didn't have a OSS crash

            Our kernel is: 3.10.0-957.21.3.el7_lustre212.x86_64

             

            mhanafi Mahmoud Hanafi added a comment - The stack trace for hung threads is the same as LU-13073 . But in our case we didn't have a OSS crash Our kernel is: 3.10.0-957.21.3.el7_lustre212.x86_64  
            ys Yang Sheng added a comment -

            Hi, Mahmoud,

            Could you please provide more info? What do you mean for similar LU-13073?

            Thanks,
            Yangsheng

            ys Yang Sheng added a comment - Hi, Mahmoud, Could you please provide more info? What do you mean for similar LU-13073 ? Thanks, Yangsheng
            pjones Peter Jones added a comment -

            Mahmoud

            Could you please supply details of the kernel version that you are running?

            Yang Sheng

            Could you please advise

            Thanks

            Peter

            pjones Peter Jones added a comment - Mahmoud Could you please supply details of the kernel version that you are running? Yang Sheng Could you please advise Thanks Peter

            People

              ys Yang Sheng
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: