Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.0.0, Lustre 2.1.0, Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 1.8.x (1.8.0 - 1.8.5), Lustre 2.5.0
-
None
-
Any lustre from 1.6.0 with mountconf and OST prepared with --index option.
-
3
-
8645
Description
client may have lost a reply to register target operation, but MGS will think reply is delivered and mark a target as used, but client don't have an reply accepted and think it need restart register from beginning after reconnect.
OOPS.
MGS send response a index already used.
[ 2619.730706] Lustre: MGC172.18.1.2@tcp: Reactivating import [ 2626.816706] Lustre: 56551:0:(client.c:1819:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1370729827/real 1370729827] req@ffff8806014d5c00 x1437314393833474/t0(0) o253->MGC172.18.1.2@tcp@172.18.1.2@tcp:26/25 lens 4736/4736 e 0 to 1 dl 1370729834 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [ 2626.844904] LustreError: 166-1: MGC172.18.1.2@tcp: Connection to MGS (at 172.18.1.2@tcp) was lost; in progress operations using this service will fail [ 2626.896533] Lustre: MGC172.18.1.2@tcp: Reactivating import [ 2626.902111] Lustre: MGC172.18.1.2@tcp: Connection restored to MGS (at 172.18.1.2@tcp) [ 2629.380926] Lustre: MGC172.18.1.2@tcp: Reactivating import [ 2632.367220] LustreError: 15f-b: Communication to the MGS return error -98. Is the MGS running? [ 2632.376077] LustreError: 58337:0:(obd_mount.c:1834:server_fill_super()) Unable to start targets: -98
attached logs describe that bug in details (log1 from MGS side, log2 from OSS side - initial register xid is x1437620344193026).
Bug hit because MGS don't schedule a reply to the target register command, and assume client always get a reply. Bug originally hit on Xyratex b_neo_stable branch (mostly 2.1 codebase) but quick look say - bug exist at 2.4 also.
Attachments
Issue Links
- is related to
-
LU-16475 Reusing OST indexes after lctl del_ost
- Open
-
LU-17716 add 'tunefs.lustre --replace' to allow OST to be skip MGS registration
- Open
-
LU-14 live replacement of OST
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...