Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4216

lockup in mdt_intent_layout -> lu_object_find_at

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • None
    • None
    • 3
    • 11468

    Description

      I've been hittign this lately in racer.
      unmount is not able to finish and things hang, but I suspect the lockup is not necessary related to shutdown.
      I also have a crashdump dumped about an 30 minutes after the condition was detected, if desired.
      This is a fairly recent master too.

      [ 9489.412732] LNet: Service thread pid 21501 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 9489.414125] Pid: 21501, comm: mdt00_006
      [ 9489.414387] 
      [ 9489.414387] Call Trace:
      [ 9489.414848]  [<ffffffffa056d834>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
      [ 9489.415165]  [<ffffffffa056de4b>] lu_object_find_at+0xab/0x360 [obdclass]
      [ 9489.415522]  [<ffffffffa0695976>] ? lustre_msg_string+0x96/0x290 [ptlrpc]
      [ 9489.415870]  [<ffffffff8105ad30>] ? default_wake_function+0x0/0x20
      [ 9489.416214]  [<ffffffffa06958d5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      [ 9489.416615]  [<ffffffffa056e116>] lu_object_find+0x16/0x20 [obdclass]
      [ 9489.416928]  [<ffffffffa0b1bad6>] mdt_object_find+0x56/0x170 [mdt]
      [ 9489.417224]  [<ffffffffa0b2bac4>] mdt_getattr_name_lock+0x804/0x19a0 [mdt]
      [ 9489.417564]  [<ffffffffa06958d5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
      [ 9489.418207]  [<ffffffffa06bc336>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
      [ 9489.418555]  [<ffffffffa0697b84>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
      [ 9489.418870]  [<ffffffffa0b2cef9>] mdt_intent_getattr+0x299/0x480 [mdt]
      [ 9489.419172]  [<ffffffffa0b1d5b9>] mdt_intent_policy+0x499/0xca0 [mdt]
      [ 9489.419492]  [<ffffffffa064e32a>] ldlm_lock_enqueue+0x2ea/0x860 [ptlrpc]
      [ 9489.419814]  [<ffffffffa0676c4f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      [ 9489.420147]  [<ffffffffa06ea772>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [ 9489.420462]  [<ffffffffa06e8cbf>] tgt_request_handle+0x5ff/0x1200 [ptlrpc]
      [ 9489.420813]  [<ffffffffa06a63d5>] ptlrpc_server_handle_request+0x395/0xc20 [ptlrpc]
      [ 9489.421314]  [<ffffffffa0ec540f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      [ 9489.422763]  [<ffffffffa069dd41>] ? ptlrpc_wait_event+0xc1/0x2e0 [ptlrpc]
      [ 9489.423102]  [<ffffffffa06a76ba>] ptlrpc_main+0xa5a/0x1690 [ptlrpc]
      [ 9489.423445]  [<ffffffffa06a6c60>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
      [ 9489.423801]  [<ffffffff81094726>] kthread+0x96/0xa0
      [ 9489.424090]  [<ffffffff8100c10a>] child_rip+0xa/0x20
      [ 9489.424395]  [<ffffffff81094690>] ? kthread+0x0/0xa0
      [ 9489.424686]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
      [ 9489.424955] 
      [ 9489.425179] LustreError: dumping log to /tmp/lustre-log.1383705973.21501
      [ 9497.613530] LNet: Service thread pid 30740 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 9497.614414] Pid: 30740, comm: mdt01_003
      [ 9497.614680] 
      [ 9497.614681] Call Trace:
      [ 9497.615135]  [<ffffffffa056d834>] ? htable_lookup+0x1c4/0x1e0 [obdclass]
      [ 9497.615456]  [<ffffffffa056de4b>] lu_object_find_at+0xab/0x360 [obdclass]
      [ 9497.615759]  [<ffffffff8105ad30>] ? default_wake_function+0x0/0x20
      [ 9497.616061]  [<ffffffffa056e116>] lu_object_find+0x16/0x20 [obdclass]
      [ 9497.616375]  [<ffffffffa0b1bad6>] mdt_object_find+0x56/0x170 [mdt]
      [ 9497.616676]  [<ffffffffa0b2478d>] mdt_intent_layout+0x12d/0x640 [mdt]
      [ 9497.616975]  [<ffffffffa0b1d5b9>] mdt_intent_policy+0x499/0xca0 [mdt]
      [ 9497.617299]  [<ffffffffa064e32a>] ldlm_lock_enqueue+0x2ea/0x860 [ptlrpc]
      [ 9497.617684]  [<ffffffffa0676c4f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
      [ 9497.618028]  [<ffffffffa06ea772>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
      [ 9497.618340]  [<ffffffffa06e8cbf>] tgt_request_handle+0x5ff/0x1200 [ptlrpc]
      [ 9497.618672]  [<ffffffffa06a63d5>] ptlrpc_server_handle_request+0x395/0xc20 [ptlrpc]
      [ 9497.619169]  [<ffffffffa0ec540f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      [ 9497.619496]  [<ffffffffa069dd41>] ? ptlrpc_wait_event+0xc1/0x2e0 [ptlrpc]
      [ 9497.619819]  [<ffffffffa06a76ba>] ptlrpc_main+0xa5a/0x1690 [ptlrpc]
      [ 9497.620131]  [<ffffffffa06a6c60>] ? ptlrpc_main+0x0/0x1690 [ptlrpc]
      [ 9497.620422]  [<ffffffff81094726>] kthread+0x96/0xa0
      [ 9497.620692]  [<ffffffff8100c10a>] child_rip+0xa/0x20
      [ 9497.620955]  [<ffffffff81094690>] ? kthread+0x0/0xa0
      [ 9497.621219]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
      [ 9497.621489] 
      [ 9497.622494] LustreError: dumping log to /tmp/lustre-log.1383705981.30740
      

      Attachments

        1. log.21501.gz
          0.2 kB
        2. log.30740.gz
          0.2 kB

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: