Details
-
New Feature
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.7.0, Lustre 2.8.0
-
9223372036854775807
Description
In suppress ping environment the evicted client is not able to recover from evicted state until the first access to the server which evicted the client. In the situation, the access gets -EIO and immediately return. This may cause user job ends with error termination.
We can avoid the situation by running "lfs df" before every single operation. But it's really troublesome and we actually cannot do such a thing.
Eviction notifier, this patch provides, is one of the solution to the problem. With this function.
At first, the target(MDT, OST) which evicted a client notifies MGS an eviction event.
Then MGS send a request to the evicted client.
Finally, getting the request and the client sends a ping to the target server to find "I'm evicted".
There are a finite number of operations that a client will do after being evicted and idle for some time. Just walk through them and figure out which work and which do not.
For instance, I would hope that an open() call would work after eviction. Hopefully when the open() fails, the client reconnects and retries the open(), and the application is none the wiser that this occurred. Is that the case?