Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.4.2
-
SLES 11 SP2
Lustre 2.4.2
-
3
-
12978
Description
We have applied the patch provided in teh LU-3645. And still the customer complains that the issue is can be reproduced.
Attaching the latest set of logs.
The issue re-occured on 18th Feb.
In pfscn3,
In pfscn4
In pfscn3
the device "dm-9" was mounted at 16:20:04 as pfscdat2-OST0000, during recovery, Lustre indeed found there were 27 clients (26 normal clients,
1 client from MDT), but it seems these 26 normal clients didn't recover with pfscn3 (the eviction condition after recovery timeout is either the client
didn't need recovery or there was no queued replay request). then these clients were deleted and pfscdat2-OST0000 was unmounted at 16:25:29.
In pfscn4
the device "dm-11" was mounted at 16:25:43 as pfscdat2-OST0000, but it didn't contain client records, then these clients thought it were evicted.
then the problem could be why these clients didn't connect to pfscn3 to recover?