[LU-12412] mdt: task umount:19248 blocked for more than 120 seconds Created: 10/Jun/19  Updated: 25/Nov/19  Resolved: 25/Nov/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Sergey Cheremencev Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Race between umount and lctl abort_recovery may cause following hung task:

<3>INFO: task umount:19248 blocked for more than 120 seconds.
<3>      Not tainted 2.6.32-431.17.1.x1.6.39.x86_64 #1
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>umount        D 0000000000000007     0 19248  19247 0x00000080
<4> ffff880093fd39a8 0000000000000082 0000000000000000 0000000000000000
<4> ffff880093fd3968 ffffffff81342960 ffffffffa177edea ffffffff00000028
<4> ffff8800bf6c5ab8 ffff880093fd3fd8 000000000000fbc8 ffff8800bf6c5ab8
<4>Call Trace:
<4> [<ffffffff81342960>] ? vt_console_print+0x260/0x330
<4> [<ffffffff81526965>] schedule_timeout+0x215/0x2e0
<4> [<ffffffff810724bf>] ? release_console_sem+0x1cf/0x220
<4> [<ffffffff815265e3>] wait_for_common+0x123/0x180
<4> [<ffffffff81061dc0>] ? default_wake_function+0x0/0x20
<4> [<ffffffff815266fd>] wait_for_completion+0x1d/0x20
<4> [<ffffffffa1682b10>] target_stop_recovery_thread+0x70/0xf0 [ptlrpc]
<4> [<ffffffffa1683cee>] target_recovery_fini+0x1e/0x30 [ptlrpc]
<4> [<ffffffffa099b14c>] mdt_device_fini+0x40c/0xf40 [mdt]
<4> [<ffffffffa0eaa0e2>] class_cleanup+0x572/0xd20 [obdclass]
<4> [<ffffffffa0e8a9b6>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa0eac766>] class_process_config+0x1ed6/0x2830 [obdclass]
<4> [<ffffffffa0d5c6c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
<4> [<ffffffffa0ead57f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
<4> [<ffffffffa0e8a9b6>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa0ee5f5c>] server_put_super+0xa0c/0xed0 [obdclass]
<4> [<ffffffff811a62a6>] ? invalidate_inodes+0xf6/0x190
<4> [<ffffffff8118b35b>] generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff8118b446>] kill_anon_super+0x16/0x60
<4> [<ffffffffa0eb0436>] lustre_kill_super+0x36/0x60 [obdclass]
<4> [<ffffffff8118bbe7>] deactivate_super+0x57/0x80
<4> [<ffffffff811aabef>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811ab73b>] sys_umount+0x7b/0x3a0
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
 


 Comments   
Comment by Gerrit Updater [ 10/Jun/19 ]

Sergey Cheremencev (c17829@cray.com) uploaded a new patch: https://review.whamcloud.com/35141
Subject: LU-12412 recovery: wake all waiters of trd_finishing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4b95af2ca0eb253afd93f9cb6bf1a294cb462415

Comment by Gerrit Updater [ 16/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35141/
Subject: LU-12412 recovery: wake all waiters of trd_finishing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ea5c43c7d4d582b15783f61f113a4f6e1a057d0f

Generated at Sat Feb 10 02:52:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.