[LU-5569] recreating a reverse import produce a various fails. - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: None
Labels:
- HB
- patch
- ptlrpc

Severity:
3
Rank (Obsolete):
15532

Description

Don't reallocate a new reverse import for each client reconnect.
a reverse import disconnecting on each client reconnect open
several races in request sending (AST mostly) code.

First problem is send_rpc vs class_destroy_import() race. If sending
RPC (or resending) issued after class_destroy_import function was
called, RPC sending will failed due import generation check.

Second problem, Target_handle_connect function stop an update a
connection information for older reverse import. So RPC can't be
delivered from server to the client due wrong connection information
or security flavor changed.

Target_handle_connect function stops update connection information
for older reverse import. So we can't delivers a RPC from server to
the client due wrong connection information or security flavor
changed.

Third problem, connection flags aren't updates atomically for an
import. Target_handle_connect function does link new import before
message headers flags are set. So, RPC will have a wrong flags set
if it would be sent at the same time.

Fourth problem, client reconnecting after network flap have result
none wakeup event send to a RPC in import queues. That situation adds
noticeable timeout in case server don't send request before network
flap.

some examples

00000100:00100000:1.0:1407845348.937766:0:62024:0:(service.c:1929:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_419:4960df0f-75ed-07a2-cee7-063090dc59cd+4:19257:x1475700821793316:12345-1748@gni1:8 Request procesed in 55us (106us total) trans 0 rc 0/0

00000100:00020000:1.0:1407845393.600747:0:81897:0:(client.c:1115:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff880304e39800 x1475078782385806/t0(0) o105->snx11063-OST0070@1748@gni1:15/16 lens 344/192 e 0 to 1 dl 1407845389 ref 1 fl Rpc:X/2/ffffffff rc 0/-1

Attachments

Issue Links

is blocking

LU-5520 BL AST resend

Resolved

is duplicated by

LU-5559 ptlrpc_import_delay_req(): req wrong generation: req@ffff880583d69800 x1476486655962316/t0(0) o104->soaked-OST0004@192.168.1.124@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1408342395 ref 1 fl Rpc:X/0/ffffffff rc 0/-1

Closed

is related to

LU-5590 client should don't send a reply after eviction

Open

LU-5581 blocking ast error handling lack eviction for a local errors and some remote.

Resolved

is related to

LU-7221 replay-ost-single test_3: ASSERTION( __v > 0 && __v < ((int)0x5a5a5a5a5a5a5a5a) ) failed: value: 0

Resolved

Activity

People

Assignee:: Jian Yu

Reporter:: Alexey Lyashkov

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 02/Sep/14 5:09 AM

Updated:: 15/Mar/19 5:17 PM

Resolved:: 19/Sep/15 5:29 AM