[LU-16450] Cancel outstanding kfilnd transactions if handshake fails Created: 06/Jan/23  Updated: 19/Jan/23  Resolved: 19/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

If handshake is sent to a peer that is up but does not have kfilnd started then the handshake fails quickly:

00000800:00000200:14.0:1666723382.289200:0:14469:0:(kfilnd_tn.c:621:kfilnd_tn_state_idle()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 TN_EVENT_TX_HELLO event status 0
00000800:00000200:14.0:1666723382.289204:0:14469:0:(kfilnd_tn.c:663:kfilnd_tn_state_idle()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 Using peer 0@kfi(0x0)
00000800:00000200:14.0:1666723382.289215:0:14469:0:(kfilnd_ep.c:420:kfilnd_ep_post_send()) 1@kfi:1 Transaction ID 000000005bef6f4e: Posted send of 38 bytes to peer 0x0: rc=0
00000800:00000200:14.0:1666723382.289219:0:14469:0:(kfilnd_tn.c:285:kfilnd_tn_state_change()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 TN_STATE_IDLE -> TN_STATE_IMM_SEND state change
00000800:00000200:8.0F:1666723382.289294:0:12476:0:(kfilnd_tn.c:866:kfilnd_tn_state_imm_send()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 TN_EVENT_TX_FAIL event status -5
00000800:00000200:8.0:1666723382.289302:0:12476:0:(kfilnd_tn.c:299:kfilnd_tn_status_update()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 0 -> -5 status change
00000800:00000200:8.0:1666723382.289305:0:12476:0:(kfilnd_tn.c:305:kfilnd_tn_status_update()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 0 -> 10 health status change
00000800:00000200:8.0:1666723382.289310:0:12476:0:(kfilnd_tn.c:1381:kfilnd_tn_free()) KFILND_MSG_HELLO_REQ Transaction ID 000000005bef6f4e: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 Transaction freed

But the transaction that precipitated the handshake will continue to wait for the full LND timeout:

...
00000800:00000200:8.0:1666723423.060053:0:12476:0:(kfilnd_tn.c:645:kfilnd_tn_state_idle()) KFILND_MSG_IMMEDIATE Transaction ID 00000000d0315dad: 1@kfi:1 -> 0@kfi(0000000045a72da0):0x0 0@kfi hello response pending
...


 Comments   
Comment by Gerrit Updater [ 10/Jan/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49590
Subject: LU-16450 kfilnd: Cancel TNs if handshake fails
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 68d1df92c55103f12382bbd4ac9d06ad2f09f11a

Comment by Gerrit Updater [ 19/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49590/
Subject: LU-16450 kfilnd: Cancel TNs if handshake fails
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6e5909b72ff0b21a328c0aefbab931033f539eb7

Comment by Peter Jones [ 19/Jan/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:27:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.