[LU-9861] Client not reconnecting to OST Created: 10/Aug/17 Updated: 18/Sep/17 Resolved: 15/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Ned Bass | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
lustre-2.8.0_9.chaos |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
An OSS evicted a client on Aug 1 during a planned network outage.
[Tue Aug 1 16:43:54 2017] Lustre: lsh-OST0005: haven't heard from client d40a30fc-ef66-94ff-e318-77d2c23e45f8 (at 192.168.137.212@o2ib27) in 227 seconds. I think it's dead, and I am evicting it. exp ffff881d1aaa8c00, cur 1501631035 expire 1501630885 last 1501630808 Two days later the client had still not reconnected, although both sides could lctl ping eachother. The client logged this on the console. [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) lsh-OST0005_UUID: rc = -110 waiting for callback (1 != 0) [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:336:ptlrpc_invalidate_import()) Skipped 5 previous similar messages [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880168fc3800 x1574573141267560/t0(0) o3->lsh-OST0005-osc-ffff88203c63f800@172.19.3.22@o2ib600:6/4 lens 488/432 e 0 to 0 dl 1501630582 ref 2 fl Unregistering:ES/0/ffffffff rc -5/-1 [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:362:ptlrpc_invalidate_import()) Skipped 5 previous similar messages [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) lsh-OST0005_UUID: RPCs in "Unregistering" phase found (1). Network is sluggish? Waiting them to error out. [Thu Aug 3 12:03:52 2017] LustreError: 11704:0:(import.c:378:ptlrpc_invalidate_import()) Skipped 5 previous similar messages This seems quite similar to |
| Comments |
| Comment by Peter Jones [ 11/Aug/17 ] |
|
Jinshan In your opinion could the described behaviour be due to this patch missing from 2.8 FE - https://git.hpdd.intel.com/?p=fs/lustre-release.git;a=commit;h=ac5044566b97c7f6881bed817c2ed9752a0c6d63. If not, what is your alternative theory? Peter |
| Comment by Jinshan Xiong (Inactive) [ 14/Aug/17 ] |
|
Yes, I agree it looks like the symptom of |
| Comment by Peter Jones [ 15/Aug/17 ] |
|
The mentioned fix has been ported, reviewed and landed to the 2.8 FE branch so closing this ticket for now. We can reopen if this same issue is hit again with a release including this change. |