Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
3
-
9223372036854775807
Description
Time line of the eviction for .35 client
13:50:20 1526043020 client lost connection to OST0018, starting reconnect 13:50:33 1526043033.253491 client got blocking ast from server and answer -107 13:50:33 1526043033.253536:0:50388:0:(ldlm_lockd.c:2156:ldlm_callback_errmsg()) @@@ Operate on unconnected server: [nid 12345-136.250.40.86@tcp] [rc 0] [lock 0x0] req@ffff8804be15c9c0 13:51:12 server evict client because of -107 reply 13:52:50 1526043170 ln0210-OST0018-osc-ffff88181ea7c000: connect to target with instance 20 13:52:50 1526043170 ln0210-OST0018_UUID: changing import state from CONNECTING to EVICTED
The eviction happened because of -107 answer from a client. The client set -107 only when rq_export is null, this means that client didn`t find an export for a request handle.
Possible root cause of the wrong request handle could be a race between pltlrpc_resend_req and lnet re transmit . ptlrpc_resend_req zeroed req handle and ptlrpc_send_rpc set a handle again. If lnet resend request during this two calls, a client receive a request with zero handle.
Landed for 2.12