[LU-14999] Deadlock on parent during resend Created: 09/Sep/21  Updated: 03/Nov/21  Resolved: 03/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Parent-child lock order gets broken during resend as there is a child lock already but there isn't parent lock and MDS tries to lock it again.

0000100:00100000:0.0:1599226831.703924:0:2504:0:(service.c:2228:ptlrpc_server_handle_request()) Handling RPC req@ffffa04290d6e400 pname:cluuid+ref:pid:xid:nid:opc:job mdt00_001:b27d9eb7-ca8
e-29e4-858d-0d45cca94acc+1112:18485:x1676906578347584:12345-192.168.3.10@tcp:101:truncate.0 
00010000:00010000:0.0:1599226831.703932:0:2504:0:(ldlm_lockd.c:1257:ldlm_handle_enqueue0()) @@@ found existing lock cookie 0x74c6f4ca7618826a  req@ffffa04290d6e400 x1676906578347584/t0(0) o101->b27d9eb7-ca8e-29e4-858d-0d45cca94acc@192.168.3.10@tcp:18/0 lens 576/0 e 0 to 0 dl 1599226892 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'truncate.0' 
00000004:00010000:0.0:1599226831.703952:0:2504:0:(mdt_handler.c:3932:mdt_intent_fixup_resent()) ### Restoring lock cookie ns: mdt-lustre-MDT0002_UUID lock: ffffa041cd4d9b00/0x74c6f4ca761882 
6a lrc: 5/0,0 mode: PR/PR res: [0x280000bd6:0x2eae:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT gid 0 flags: 0x60200400000020 nid: 192.168.3.10@tcp remote: 0xe6cfce9760eb26a2 expref: 1112 pid: 3 
364 timeout: 2462 lvb_type: 0 
00000004:00010000:0.0:1599226831.703959:0:2504:0:(mdt_handler.c:3934:mdt_intent_fixup_resent()) @@@ restoring lock cookie 0x74c6f4ca7618826a  req@ffffa04290d6e400 x1676906578347584/t0(0) o101->b27d9eb7-ca8e-29e4-858d-0d45cca94acc@192.168.3.10@tcp:18/0 lens 576/3272 e 0 to 0 dl 1599226892 ref 1 fl Interpret:/2/0 rc 0/0 job:'truncate.0' 
 
00010000:00010000:0.0:1599226831.704118:0:2504:0:(ldlm_lock.c:1074:ldlm_granted_list_add_lock()) ### About to add lock: ns: mdt-lustre-MDT0002_UUID lock: ffffa04291206900/0x74c6f4ca761897fd 
 lrc: 3/1,0 mode: CR/CR res: [0x280000bd0:0x1f63:0x0].0x0 bits 0x2/0x0 rrc: 22 type: IBT gid 0 flags: 0x50210000000000 nid: local remote: 0x0 expref: -99 pid: 2504 timeout: 0 lvb_type: 0 
00010000:00010000:0.0:1599226831.704125:0:2504:0:(ldlm_request.c:522:ldlm_cli_enqueue_local()) ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0002_UUID lock: ffff 
a04291206900/0x74c6f4ca761897fd lrc: 3/1,0 mode: CR/CR res: [0x280000bd0:0x1f63:0x0].0x0 bits 0x2/0x0 rrc: 22 type: IBT gid 0 flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 
 2504 timeout: 0 lvb_type: 0 
00000020:00000040:0.0:1599226831.704132:0:2504:0:(lustre_handles.c:99:class_handle_hash()) added object ffffa04291207440 with handle 0x74c6f4ca76189804 to hash 
00010000:00010000:0.0:1599226831.704137:0:2504:0:(ldlm_lock.c:775:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: mdt-lustre-MDT0002_UUID lock: ffffa04291207440/0x74c6f4ca76189804 lrc: 3/1,0 mode: --/PR res: [0x280000bd0:0x1f63:0x0].0x33 bits 0x0/0x0 rrc: 3 type: IBT gid 0 flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 2504 timeout: 0 lvb_type: 0 
00010000:00010000:0.0:1599226831.704144:0:2504:0:(ldlm_lock.c:684:ldlm_add_bl_work_item()) ### lock incompatible; sending blocking AST. ns: mdt-lustre-MDT0002_UUID lock: ffffa041cadf1440/0x74c6f4ca761897ef lrc: 2/0,1 mode: PW/PW res: [0x280000bd0:0x1f63:0x0].0x33 bits 0x2/0x0 rrc: 3 type: IBT gid 0 flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3286 timeout: 0 lvb_type: 0


 Comments   
Comment by Gerrit Updater [ 09/Sep/21 ]

"Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44885
Subject: LU-14999 mdt: Deadlock on parent during resend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f52e2264fdc0ea6e012eafd439eb176b1e118e1e

Comment by Gerrit Updater [ 03/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44885/
Subject: LU-14999 mdt: Deadlock on parent during resend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a3b3d91b740466feea54d0fe9a397ba79c001aa7

Comment by Peter Jones [ 03/Nov/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:14:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.