Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.8.0
-
lustre-2.8.0_9.chaos
-
3
-
9223372036854775807
Description
An OSS evicted a client on Aug 1 during a planned network outage.
[Tue Aug 1 16:43:54 2017] Lustre: lsh-OST0005: haven't heard from client d40a30fc-ef66-94ff-e318-77d2c23e45f8 (at 192.168.137.212@o2ib27) in 227 seconds. I think it's dead, and I am evicting it. exp ffff881d1aaa8c00, cur 1501631035 expire 1501630885 last 1501630808
Two days later the client had still not reconnected, although both sides could lctl ping eachother. The client logged this on the console.
[Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) lsh-OST0005_UUID: rc = -110 waiting for callback (1 != 0) [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) Skipped 5 previous similar messages [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880168fc3800 x1574573141267560/t0(0) o3->lsh-OST0005-osc-ffff88203c63f800@172.19.3.22@o2ib600:6/4 lens 488/432 e 0 to 0 dl 1501630582 ref 2 fl Unregistering:ES/0/ffffffff rc -5/-1 [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) Skipped 5 previous similar messages [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) lsh-OST0005_UUID: RPCs in "Unregistering" phase found (1). Network is sluggish? Waiting them to error out. [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) Skipped 5 previous similar messages
This seems quite similar to LU-8511, which was closed as a duplicate of LU-7434. That issue had two associated patches, but only https://review.whamcloud.com/#/c/18934/ was landed to 2.8 FE, whereas https://review.whamcloud.com/#/c/19953/ was not.
Attachments
Issue Links
- duplicates
-
LU-7434 lost bulk leads to a hang
- Resolved