Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.9.0
-
3
-
9223372036854775807
Description
In latest soak-test, one of MDT stuck during umount
LustreError: 0-0: Forced cleanup waiting for soaked-MDT0000-osp-MDT0002 namespace with 1 resources in use, (rc=-110)
The stack trace
umount S 0000000000000011 0 8015 8013 0x00000080 ffff8803d9b33808 0000000000000086 ffff8803d9b337d0 ffff8803d9b337cc ffff8803d9b33868 ffff88043fe84000 00001b24f314dc54 ffff880038635a00 00000000000003ff 0000000101c3089b ffff8803f3c31ad8 ffff8803d9b33fd8 Call Trace: [<ffffffff8153a9b2>] schedule_timeout+0x192/0x2e0 [<ffffffff81089fa0>] ? process_timeout+0x0/0x10 [<ffffffffa0abded0>] __ldlm_namespace_free+0x1c0/0x560 [ptlrpc] [<ffffffff81067650>] ? default_wake_function+0x0/0x20 [<ffffffffa0abe2df>] ldlm_namespace_free_prior+0x6f/0x220 [ptlrpc] [<ffffffffa13b0db2>] osp_process_config+0x4a2/0x680 [osp] [<ffffffff81291947>] ? find_first_bit+0x47/0x80 [<ffffffffa12c5650>] lod_sub_process_config+0x100/0x1f0 [lod] [<ffffffffa12cad66>] lod_process_config+0x646/0x1580 [lod] [<ffffffffa113e4ff>] ? lfsck_stop+0x15f/0x4c0 [lfsck] [<ffffffffa0801032>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] [<ffffffffa1343253>] mdd_process_config+0x113/0x5e0 [mdd] [<ffffffffa11fee62>] mdt_device_fini+0x482/0x13e0 [mdt] [<ffffffffa08df626>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] [<ffffffffa08f82c2>] class_cleanup+0x582/0xd30 [obdclass] [<ffffffffa08dae56>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa08fa5d6>] class_process_config+0x1b66/0x24c0 [obdclass] [<ffffffffa07fc151>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffff8117904c>] ? __kmalloc+0x21c/0x230 [<ffffffffa08fb3ef>] class_manual_cleanup+0x4bf/0xc90 [obdclass] [<ffffffffa08dae56>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa092983c>] server_put_super+0x8bc/0xcd0 [obdclass] [<ffffffff81194aeb>] generic_shutdown_super+0x5b/0xe0 [<ffffffff81194bd6>] kill_anon_super+0x16/0x60 [<ffffffffa08fe596>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff81195377>] deactivate_super+0x57/0x80 [<ffffffff811b533f>] mntput_no_expire+0xbf/0x110 [<ffffffff811b5e8b>] sys_umount+0x7b/0x3a0 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
And it seems there is a MDT handler thread (mdt_rename), which holds the remote lock on soaked-MDT0000-osp-MDT0002, but then stuck on local lock enqueue, which then block the namespace cleanup of umount.
mdt01_016 S 000000000000000a 0 7405 2 0x00000080 ffff8804027ab900 0000000000000046 0000000000000000 ffffffff810a1c1c ffff880433fef520 ffff8804027ab880 00000a768c137fd5 0000000000000000 ffff8804027ab8c0 0000000100ab043e ffff880433fefad8 ffff8804027abfd8 Call Trace: [<ffffffff810a1c1c>] ? remove_wait_queue+0x3c/0x50 [<ffffffffa0ad54b0>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc] [<ffffffffa0ada07d>] ldlm_completion_ast+0x68d/0x9b0 [ptlrpc] [<ffffffff81067650>] ? default_wake_function+0x0/0x20 [<ffffffffa0ad93fe>] ldlm_cli_enqueue_local+0x21e/0x810 [ptlrpc] [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] [<ffffffffa12074a4>] mdt_object_local_lock+0x3a4/0xb00 [mdt] [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa1208103>] mdt_object_lock_internal+0x63/0x320 [mdt] [<ffffffffa1218e9e>] ? mdt_lookup_version_check+0x9e/0x350 [mdt] [<ffffffffa1208580>] mdt_reint_object_lock+0x20/0x60 [mdt] [<ffffffffa121cba7>] mdt_reint_rename_or_migrate+0x1317/0x2690 [mdt] [<ffffffffa11fa770>] ? mdt_blocking_ast+0x0/0x2e0 [mdt] [<ffffffffa0ad99f0>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa09238c0>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa0b06b00>] ? lustre_pack_reply_v2+0xf0/0x280 [ptlrpc] [<ffffffffa121df53>] mdt_reint_rename+0x13/0x20 [mdt] [<ffffffffa121704d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa1201d5b>] mdt_reint_internal+0x62b/0xa50 [mdt] [<ffffffffa120262b>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0b6b0cc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] [<ffffffffa0b17821>] ptlrpc_main+0xd31/0x1800 [ptlrpc] [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0 [<ffffffffa0b16af0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] [<ffffffff810a138e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff810a12f0>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20