Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      In latest soak-test, one of MDT stuck during umount

      LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0002 namespace with 1 resources in use, (rc=-110)
      

      The stack trace

      umount        S 0000000000000011     0  8015   8013 0x00000080
       ffff8803d9b33808 0000000000000086 ffff8803d9b337d0 ffff8803d9b337cc
       ffff8803d9b33868 ffff88043fe84000 00001b24f314dc54 ffff880038635a00
       00000000000003ff 0000000101c3089b ffff8803f3c31ad8 ffff8803d9b33fd8
      Call Trace:
       [<ffffffff8153a9b2>] schedule_timeout+0x192/0x2e0
       [<ffffffff81089fa0>] ? process_timeout+0x0/0x10
       [<ffffffffa0abded0>] __ldlm_namespace_free+0x1c0/0x560 [ptlrpc]
       [<ffffffff81067650>] ? default_wake_function+0x0/0x20
       [<ffffffffa0abe2df>] ldlm_namespace_free_prior+0x6f/0x220 [ptlrpc]
       [<ffffffffa13b0db2>] osp_process_config+0x4a2/0x680 [osp]
       [<ffffffff81291947>] ? find_first_bit+0x47/0x80
       [<ffffffffa12c5650>] lod_sub_process_config+0x100/0x1f0 [lod]
       [<ffffffffa12cad66>] lod_process_config+0x646/0x1580 [lod]
       [<ffffffffa113e4ff>] ? lfsck_stop+0x15f/0x4c0 [lfsck]
       [<ffffffffa0801032>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs]
       [<ffffffffa1343253>] mdd_process_config+0x113/0x5e0 [mdd]
       [<ffffffffa11fee62>] mdt_device_fini+0x482/0x13e0 [mdt]
       [<ffffffffa08df626>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
       [<ffffffffa08f82c2>] class_cleanup+0x582/0xd30 [obdclass]
       [<ffffffffa08dae56>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa08fa5d6>] class_process_config+0x1b66/0x24c0 [obdclass]
       [<ffffffffa07fc151>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffff8117904c>] ? __kmalloc+0x21c/0x230
       [<ffffffffa08fb3ef>] class_manual_cleanup+0x4bf/0xc90 [obdclass]
       [<ffffffffa08dae56>] ? class_name2dev+0x56/0xe0 [obdclass]
       [<ffffffffa092983c>] server_put_super+0x8bc/0xcd0 [obdclass]
       [<ffffffff81194aeb>] generic_shutdown_super+0x5b/0xe0
       [<ffffffff81194bd6>] kill_anon_super+0x16/0x60
       [<ffffffffa08fe596>] lustre_kill_super+0x36/0x60 [obdclass]
       [<ffffffff81195377>] deactivate_super+0x57/0x80
       [<ffffffff811b533f>] mntput_no_expire+0xbf/0x110
       [<ffffffff811b5e8b>] sys_umount+0x7b/0x3a0
       [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      

      And it seems there is a MDT handler thread (mdt_rename), which holds the remote lock on soaked-MDT0000-osp-MDT0002, but then stuck on local lock enqueue, which then block the namespace cleanup of umount.

      mdt01_016     S 000000000000000a     0  7405      2 0x00000080
       ffff8804027ab900 0000000000000046 0000000000000000 ffffffff810a1c1c
       ffff880433fef520 ffff8804027ab880 00000a768c137fd5 0000000000000000
       ffff8804027ab8c0 0000000100ab043e ffff880433fefad8 ffff8804027abfd8
      Call Trace:
       [<ffffffff810a1c1c>] ? remove_wait_queue+0x3c/0x50
       [<ffffffffa0ad54b0>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc]
       [<ffffffffa0ada07d>] ldlm_completion_ast+0x68d/0x9b0 [ptlrpc]
       [<ffffffff81067650>] ? default_wake_function+0x0/0x20
       [<ffffffffa0ad93fe>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc]
       [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
       [<ffffffffa12074a4>] mdt_object_local_lock+0x3a4/0xb00 [mdt]
       [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
       [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa1208103>] mdt_object_lock_internal+0x63/0x320 [mdt]
       [<ffffffffa1218e9e>] ? mdt_lookup_version_check+0x9e/0x350 [mdt]
       [<ffffffffa1208580>] mdt_reint_object_lock+0x20/0x60 [mdt]
       [<ffffffffa121cba7>] mdt_reint_rename_or_migrate+0x1317/0x2690 [mdt]
       [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
       [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa09238c0>] ? lu_ucred+0x20/0x30 [obdclass]
       [<ffffffffa0b06b00>] ? lustre_pack_reply_v2+0xf0/0x280 [ptlrpc]
       [<ffffffffa121df53>] mdt_reint_rename+0x13/0x20 [mdt]
       [<ffffffffa121704d>] mdt_reint_rec+0x5d/0x200 [mdt]
       [<ffffffffa1201d5b>] mdt_reint_internal+0x62b/0xa50 [mdt]
       [<ffffffffa120262b>] mdt_reint+0x6b/0x120 [mdt]
       [<ffffffffa0b6b0cc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
       [<ffffffffa0b17821>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
       [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
       [<ffffffffa0b16af0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
       [<ffffffff810a138e>] kthread+0x9e/0xc0
       [<ffffffff8100c28a>] child_rip+0xa/0x20
       [<ffffffff810a12f0>] ? kthread+0x0/0xc0
       [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      Attachments

        1. lola-10.log
          1.39 MB
        2. lola-8.log
          4.30 MB

        Activity

          People

            laisiyao Lai Siyao
            di.wang Di Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: