[LU-10992] umount sometimes will get stuck Created: 02/May/18  Updated: 03/Aug/18  Resolved: 12/May/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: Lustre 2.12.0, Lustre 2.10.5

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

sles12sp3


Attachments: File r511i7n14.dump    
Issue Links:
Related
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

This looks like a duplicate of LU-7994. But have the patch from that LU applied.

umount filesystem will get stuck.

r511i7n14 /proc/97511 # cat stack 
[<ffffffffa10c5df6>] ll_kill_super+0x86/0x170 [lustre]
[<ffffffffa0c87b5a>] lustre_kill_super+0x3a/0x50 [obdclass]
[<ffffffff8121f4d8>] deactivate_locked_super+0x48/0x80
[<ffffffff8121f556>] deactivate_super+0x46/0x60
[<ffffffff8123b89f>] cleanup_mnt+0x3f/0x80
[<ffffffff8123b932>] __cleanup_mnt+0x12/0x20
[<ffffffff810a20b9>] task_work_run+0x89/0xb0
[<ffffffff8107f3de>] exit_to_usermode_loop+0x95/0xca
[<ffffffff81003b6d>] syscall_return_slowpath+0x8d/0xa0
[<ffffffff8164b167>] int_ret_from_sys_call+0x25/0x9f
[<ffffffffffffffff>] 0xffffffffffffffff

will upload client debug dump.
There is one interesting error message on the console.

LustreError: 94086:0:(statahead.c:1591:start_statahead_thread()) can't start ll_sa thread, rc: -4


 Comments   
Comment by Peter Jones [ 03/May/18 ]

Fan Yong

Could you please investigate?

Thanks

Peter

Comment by Gerrit Updater [ 04/May/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32287
Subject: LU-10992 llite: decrease sa_running if fail to start statahead
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1edab5f1c4a9cb3269e6cd6172403de2e0348bc1

Comment by Jay Lan (Inactive) [ 09/May/18 ]

This patch has conflicts on lustre/llite/statahead.c in b2_10 and b2_11.

Shall I pick up LU-10165 "llite: disable statahead if starting statahead fail",  commit 8b1bd1b88ae before this patch or shall I ignore 8b1bd1b88ae?

And if I am going to pick up  8b1bd1b88ae, is there any other commit that I need also?

Comment by nasf (Inactive) [ 11/May/18 ]

Shall I pick up LU-10165 "llite: disable statahead if starting statahead fail", commit 8b1bd1b88ae before this patch or shall I ignore 8b1bd1b88ae?

You need the patch for LU-10165 firstly.

And if I am going to pick up 8b1bd1b88ae, is there any other commit that I need also?

Enough, no others.

Comment by Gerrit Updater [ 12/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32287/
Subject: LU-10992 llite: decrease sa_running if fail to start statahead
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6b8638bf792086d9e4b249e95091095bba4ada02

Comment by Peter Jones [ 12/May/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 23/May/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32526
Subject: LU-10992 llite: decrease sa_running if fail to start statahead
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 5dc452edc333e1f6a4974e646efef482faf8d139

Comment by Gerrit Updater [ 03/Aug/18 ]

John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/32526/
Subject: LU-10992 llite: decrease sa_running if fail to start statahead
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: e92b4e27e5c13e3fc503bb26a96ca229d7a4749e

Generated at Sat Feb 10 02:40:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.