[LU-15118] There isn't any free thread to process resend request Created: 16/Oct/21  Updated: 27/Sep/23  Resolved: 27/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-17149 TBF: req_capsule_extend() ASSERTION( ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Open request is processed but the reply was lost:

00010000:00010000:0.0:1602492290.594418:0:14205:0:(ldlm_lockd.c:1465:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=0, rc=0) ns: mdt-lustre-MDT0002_UUID lock: ffff8c0a8d93cb40/0xecdbb64c9917086c lrc: 3/0,0 mode: CR/CR res: [0x280000be0:0xea4:0x0].0x0 bits 0x9/0x0 rrc: 2 type: IBT gid 0 flags: 0x40200000000000 nid: 192.168.3.8@tcp remote: 0xfb5bd69624c1a4d4 expref: 881 pid: 14205 timeout: 0 lvb_type: 0
00010000:00010000:0.0:1602492290.594434:0:14205:0:(ldlm_lockd.c:1544:ldlm_handle_enqueue0()) ### server-side enqueue handler END (lock ffff8c0a8d93cb40, rc 0)
00010000:00000200:0.0:1602492290.594440:0:14205:0:(ldlm_lib.c:2989:target_send_reply_msg()) @@@ sending reply  req@ffff8c0a93630d80 x1680332092323520/t8590872076(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:25/0 lens 808/680 e 0 to 0 dl 1602492351 ref 1 fl Interpret:/0/0 rc 0/0 job:'cp.0'
00000100:00100000:0.0:1602492290.594465:0:14205:0:(service.c:2278:ptlrpc_server_handle_request()) Handled RPC req@ffff8c0a93630d80 pname:cluuid+ref:pid:xid:nid:opc:job mdt00_039:649f59bb-d9ab-a7a1-b0ba-b64b3f540924+882:15694:x1680332092323520:12345-192.168.3.8@tcp:101:cp.0 Request processed in 56792us (56898us total) trans 8590872076 rc 0/0
00000100:00100000:1.0:1602492353.522406:0:14193:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
  req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492414 ref 2 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
00000100:00100000:1.0:1602492415.536875:0:14178:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
00000100:00080000:1.1:1602492415.536881:0:14178:0:(service.c:1628:ptlrpc_server_check_resend_in_progress()) @@@ Found duplicate req in processing  req@ffff8c0a91752400 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:28/0 lens 808/0 e 0 to 0 dl 1602492476 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
00000100:00080000:1.1:1602492415.536888:0:14178:0:(service.c:1629:ptlrpc_server_check_resend_in_progress()) @@@ Request being processed  req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492414 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
00000100:00100000:1.0:1602492477.550365:0:14178:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
00000100:00080000:1.1:1602492477.550388:0:14178:0:(service.c:1628:ptlrpc_server_check_resend_in_progress()) @@@ Found duplicate req in processing  req@ffff8c0a8e7f7a80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:29/0 lens 808/0 e 0 to 0 dl 1602492538 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
00000100:00080000:1.1:1602492477.550398:0:14178:0:(service.c:1629:ptlrpc_server_check_resend_in_progress()) @@@ Request being processed  req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492476 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'

File unlink is blocked by open in resend state:

_schedule
schedule
schedule_timeout
ldlm_completion_ast
ldlm_cli_enqueue_local
mdt_object_local_lock
mdt_object_lock_internal
mdt_reint_object_lock
mdt_reint_striped_lock
mdt_reint_unlink
mdt_reint_rec
mdt_reint_internal
mdt_reint
tgt_request_handle
ptlrpc_server_handle_request
ptlrpc_main
kthread
Progs:  14668 "mdt00_060"

All other mdt threads are are waiting for getattr on the parent directory:

__schedule
schedule
schedule_timeout
ldlm_completion_ast
ldlm_cli_enqueue_local
mdt_object_local_lock
mdt_object_lock_internal
mdt_getattr_name_lock
mdt_intent_getattr
mdt_intent_opc
mdt_intent_policy
ldlm_lock_enqueue
ldlm_handle_enqueue0
tgt_enqueue
tgt_request_handle
ptlrpc_server_handle_request
ptlrpc_main
kthread
Progs:  13713 "mdt00_000" 13714 "mdt00_001" 13715 "mdt00_002" 14101 "mdt00_003" 14155 "mdt00_004" 14156 "mdt00_005" 14157 "mdt00_006" 14158 "mdt00_007" 14160 "mdt00_008" 14161 "mdt00_009" 14162 "mdt00_010" 14163 "mdt00_011" 14165 "mdt00_012" 14166 "mdt00_013" 14170 "mdt00_014" 14171 "mdt00_015" 14172 "mdt00_016" 14173 "mdt00_017" 14174 "mdt00_018" 14175 "mdt00_019" 14176 "mdt00_020" 14177 "mdt00_021" 14179 "mdt00_023" 14180 "mdt00_024" 14181 "mdt00_025" 14182 "mdt00_026" 14183 "mdt00_027" 14184 "mdt00_028" 14189 "mdt00_029" 14190 "mdt00_030" 14191 "mdt00_031" 14192 "mdt00_032" 14194 "mdt00_034" 14195 "mdt00_035" 14196 "mdt00_036" 14203 "mdt00_037" 14204 "mdt00_038" 14205 "mdt00_039" 14206 "mdt00_040" 14210 "mdt00_041" 14211 "mdt00_042" 14212 "mdt00_043" 14213 "mdt00_044" 14214 "mdt00_045" 14215 "mdt00_046" 14216 "mdt00_047" 14217 "mdt00_048" 14655 "mdt00_049" 14656 "mdt00_050" 14658 "mdt00_051" 14659 "mdt00_052" 14660 "mdt00_053" 14661 "mdt00_054" 14662 "mdt00_055" 14663 "mdt00_056" 14664 "mdt00_057" 14665 "mdt00_058" 14667 "mdt00_059" 14669 "mdt00_061" 14670 "mdt00_062" 14671 "mdt00_063" 14672 "mdt00_064" 14673 "mdt00_065" 14674 "mdt00_066" 14675 "mdt00_067" 14676 "mdt00_068" 14677 "mdt00_069" 14678 "mdt00_070" 14679 "mdt00_071" 14680 "mdt00_072" 14700 "mdt00_073" 14701 "mdt00_074" 14702 "mdt00_075" 14703 "mdt00_076" 14704 "mdt00_077" 14705 "mdt00_078" 14706 "mdt00_079"


 Comments   
Comment by Gerrit Updater [ 16/Oct/21 ]

"Andriy Skulysh <andriy.skulysh@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45272
Subject: LU-15118 ldlm: no free thread to process resend request
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bf8de8020f656aed69af163cace68fbb7ad1a0b3

Comment by Gerrit Updater [ 27/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45272/
Subject: LU-15118 ldlm: no free thread to process resend request
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7e8f25ed3cd8b5435f92ba8b343aabfe0a180c5b

Comment by Peter Jones [ 27/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:15:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.