Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 5859

    Description

      I am having replaye-single test 48 consistently hanging.
      There's a stack-trace for a hung task in the log and that task never finishes it looks like:

      [246707.608040] LNet: Service thread pid 16278 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [246707.608566] Pid: 16278, comm: mdt00_001
      [246707.608714] 
      [246707.608715] Call Trace:
      [246707.609128]  [<ffffffffa0f9b7ae>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [246707.609381]  [<ffffffffa09d26d4>] osp_precreate_reserve+0x3a4/0x620 [osp]
      [246707.609664]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [246707.609914]  [<ffffffffa09d1633>] osp_declare_object_create+0x163/0x540 [osp]
      [246707.610746]  [<ffffffffa098a4bd>] lod_qos_declare_object_on+0xed/0x4c0 [lod]
      [246707.611049]  [<ffffffffa098c094>] lod_alloc_rr.clone.2+0x624/0xd90 [lod]
      [246707.611313]  [<ffffffffa098db8c>] lod_qos_prep_create+0xe5c/0x1848 [lod]
      [246707.611610]  [<ffffffffa098886b>] lod_declare_striped_object+0x14b/0x920 [lod]
      [246707.612053]  [<ffffffffa0989348>] lod_declare_object_create+0x308/0x4f0 [lod]
      [246707.612465]  [<ffffffffa07364bf>] mdd_declare_object_create_internal+0xaf/0x1d0 [mdd]
      [246707.612926]  [<ffffffffa07475ea>] mdd_create+0x39a/0x1550 [mdd]
      [246707.613334]  [<ffffffffa08cd759>] mdt_reint_open+0x1079/0x1860 [mdt]
      [246707.613649]  [<ffffffffa1075140>] ? lu_ucred+0x20/0x30 [obdclass]
      [246707.613897]  [<ffffffffa0898655>] ? mdt_ucred+0x15/0x20 [mdt]
      [246707.614105]  [<ffffffffa08b8651>] mdt_reint_rec+0x41/0xe0 [mdt]
      [246707.614347]  [<ffffffffa08b1b13>] mdt_reint_internal+0x4e3/0x7e0 [mdt]
      [246707.614559]  [<ffffffffa08b20dd>] mdt_intent_reint+0x1ed/0x500 [mdt]
      [246707.614854]  [<ffffffffa08adca5>] mdt_intent_policy+0x3c5/0x800 [mdt]
      [246707.615163]  [<ffffffffa11c643a>] ldlm_lock_enqueue+0x2ea/0x890 [ptlrpc]
      [246707.615486]  [<ffffffffa11ef3b7>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc]
      [246707.615812]  [<ffffffffa08ad7f6>] mdt_enqueue+0x46/0x130 [mdt]
      [246707.616091]  [<ffffffffa08a1822>] mdt_handle_common+0x932/0x1750 [mdt]
      [246707.616327]  [<ffffffffa08a2715>] mdt_regular_handle+0x15/0x20 [mdt]
      [246707.616560]  [<ffffffffa121d953>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc]
      [246707.616994]  [<ffffffffa0f9b66e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      [246707.617304]  [<ffffffffa1216621>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
      [246707.617595]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
      [246707.617888]  [<ffffffffa122048d>] ptlrpc_main+0xb3d/0x18e0 [ptlrpc]
      [246707.618203]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.618431]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [246707.618628]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.618944]  [<ffffffffa121f950>] ? ptlrpc_main+0x0/0x18e0 [ptlrpc]
      [246707.619190]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      I have a crash dump for such occurence as well
      This dump is with patch from lu2285 applied, but also happens without lu2285 patch in.

      Attachments

        Activity

          [LU-2500] replay-single test 48 lockup
          pjones Peter Jones added a comment -

          Landed for 2.4

          pjones Peter Jones added a comment - Landed for 2.4
          bzzz Alex Zhuravlev added a comment - please try with http://review.whamcloud.com/4846

          I was able to reproduce this.

          bzzz Alex Zhuravlev added a comment - I was able to reproduce this.

          can you attach the full dmesg and lustre logs please ?

          bzzz Alex Zhuravlev added a comment - can you attach the full dmesg and lustre logs please ?

          People

            bzzz Alex Zhuravlev
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: