[LU-17092] umount stuck Created: 06/Sep/23  Updated: 14/Oct/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

client client-2.12.8_ddn12
server: lustre-2.15.3


Issue Links:
Related
is related to LU-15660 parallel-scale-nfsv4 test_racer_on_nf... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Umount getting stuck on compute nodes.

24109 umount
[<0>] ll_kill_super+0x7e/0x150 [lustre]
[<0>] lustre_kill_super+0x2b/0x40 [obdclass]
[<0>] deactivate_locked_super+0x34/0x70
[<0>] cleanup_mnt+0x3b/0x70
[<0>] task_work_run+0xa3/0xe0
[<0>] exit_to_usermode_loop+0xef/0x100
[<0>] do_syscall_64+0x19c/0x1b0
[<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
 

Looks like a duplicate of LU-14883.

Here are statahead threads

11580 ll_sa_11521
[<0>] ll_statahead_thread+0xc87/0x1210 [lustre]
[<0>] kthread+0x124/0x140
[<0>] ret_from_fork+0x35/0x40

11581 ll_agl_11521
[<0>] ll_agl_thread+0x495/0x500 [lustre]
[<0>] kthread+0x124/0x140
[<0>] ret_from_fork+0x35/0x40 


 Comments   
Comment by Peter Jones [ 06/Sep/23 ]

Yingjin

Could you please investigate?

Thanks

Peter

Comment by Qian Yingjin [ 06/Sep/23 ]

This seems to be a duplicate of LU-15660.

Comment by Peter Jones [ 08/Sep/23 ]

Thanks qian_wc . Mahmoud, have you tried the effectiveness of https://review.whamcloud.com/#/c/fs/lustre-release/+/52300/ ?

Comment by Mahmoud Hanafi [ 08/Sep/23 ]

We are waiting for DDN to provide us with a client that has the patch.

Comment by Peter Jones [ 08/Sep/23 ]

OK. I have not seen that request coming through support channels, but it may well be handled at the support level.

Generated at Sat Feb 10 03:32:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.