Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 5859

    Description

      I am having replaye-single test 48 consistently hanging.
      There's a stack-trace for a hung task in the log and that task never finishes it looks like:

      [246707.608040] LNet: Service thread pid 16278 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [246707.608566] Pid: 16278, comm: mdt00_001
      [246707.608714] 
      [246707.608715] Call Trace:
      [246707.609128]  [<ffffffffa0f9b7ae>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [246707.609381]  [<ffffffffa09d26d4>] osp_precreate_reserve+0x3a4/0x620 [osp]
      [246707.609664]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [246707.609914]  [<ffffffffa09d1633>] osp_declare_object_create+0x163/0x540 [osp]
      [246707.610746]  [<ffffffffa098a4bd>] lod_qos_declare_object_on+0xed/0x4c0 [lod]
      [246707.611049]  [<ffffffffa098c094>] lod_alloc_rr.clone.2+0x624/0xd90 [lod]
      [246707.611313]  [<ffffffffa098db8c>] lod_qos_prep_create+0xe5c/0x1848 [lod]
      [246707.611610]  [<ffffffffa098886b>] lod_declare_striped_object+0x14b/0x920 [lod]
      [246707.612053]  [<ffffffffa0989348>] lod_declare_object_create+0x308/0x4f0 [lod]
      [246707.612465]  [<ffffffffa07364bf>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
      [246707.612926]  [<ffffffffa07475ea>] mdd_create+0x39a/0x1550 [mdd]
      [246707.613334]  [<ffffffffa08cd759>] mdt_reint_open+0x1079/0x1860 [mdt]
      [246707.613649]  [<ffffffffa1075140>] ? lu_ucred+0x20/0x30 [obdclass]
      [246707.613897]  [<ffffffffa0898655>] ? mdt_ucred+0x15/0x20 [mdt]
      [246707.614105]  [<ffffffffa08b8651>] mdt_reint_rec+0x41/0xe0 [mdt]
      [246707.614347]  [<ffffffffa08b1b13>] mdt_reint_internal+0x4e3/0x7e0 [mdt]
      [246707.614559]  [<ffffffffa08b20dd>] mdt_intent_reint+0x1ed/0x500 [mdt]
      [246707.614854]  [<ffffffffa08adca5>] mdt_intent_policy+0x3c5/0x800 [mdt]
      [246707.615163]  [<ffffffffa11c643a>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
      [246707.615486]  [<ffffffffa11ef3b7>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [246707.615812]  [<ffffffffa08ad7f6>] mdt_enqueue+0x46/0x130 [mdt]
      [246707.616091]  [<ffffffffa08a1822>] mdt_handle_common+0x932/0x1750 [mdt]
      [246707.616327]  [<ffffffffa08a2715>] mdt_regular_handle+0x15/0x20 [mdt]
      [246707.616560]  [<ffffffffa121d953>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
      [246707.616994]  [<ffffffffa0f9b66e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [246707.617304]  [<ffffffffa1216621>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [246707.617595]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [246707.617888]  [<ffffffffa122048d>] ptlrpc_main+0xb3d/0x18e0 [ptlrpc]
      [246707.618203]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.618431]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [246707.618628]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.618944]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.619190]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      I have a crash dump for such occurence as well
      This dump is with patch from lu2285 applied, but also happens without lu2285 patch in.

      Attachments

        Activity

          People

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: