[LU-11117] Client eviction due to a lock blocking callback time out: rc -107 Created: 04/Jul/18  Updated: 18/Jul/18  Resolved: 18/Jul/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Critical
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Time line of the eviction for .35 client

13:50:20 1526043020 client lost connection to OST0018, starting reconnect
13:50:33 1526043033.253491 client got blocking ast from server and answer -107
13:50:33 1526043033.253536:0:50388:0:(ldlm_lockd.c:2156:ldlm_callback_errmsg()) @@@ Operate on unconnected server: [nid 12345-136.250.40.86@tcp] [rc 0] [lock 0x0]  req@ffff8804be15c9c0
13:51:12 server evict client because of -107 reply
13:52:50 1526043170 ln0210-OST0018-osc-ffff88181ea7c000: connect to target with instance 20
13:52:50 1526043170 ln0210-OST0018_UUID: changing import state from CONNECTING to EVICTED

The eviction happened because of -107 answer from a client. The client set -107 only when rq_export is null, this means that client didn`t find an export for a request handle.
Possible root cause of the wrong request handle could be a race between pltlrpc_resend_req and lnet re transmit . ptlrpc_resend_req zeroed req handle and ptlrpc_send_rpc set a handle again. If lnet resend request during this two calls, a client receive a request with zero handle.



 Comments   
Comment by Gerrit Updater [ 04/Jul/18 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/32781
Subject: LU-11117 ptlrpc: don`t zero request handle
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0aefb0379950062e117926abc03e87d1c72b1ff5

Comment by Gerrit Updater [ 18/Jul/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32781/
Subject: LU-11117 ptlrpc: don't zero request handle
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 00c72ab6bb432ee1312282eed3dfae23ab8d0b42

Comment by Peter Jones [ 18/Jul/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:41:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.