Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
6985
Description
There have been periodic complaints about lustre not really knowing when it was evicted from a server node, as this could only be known in case an RPC is sent.
Frequently this would be handled by a periodic ping, but with this functionality being turned down to happen in rarer cases, it more and more converts into the case of an app initiating an RPC and being evicted all of a sudden due to an eviction that has happened quite a while ago.
As such we probably need a somewhat better way of notifying clients of their eviction so that they can reconnect somewhat more eagerly and with a bit less damage to whatever it is that might be running in userspace.
Attachments
Issue Links
- is related to
-
LU-2467 ABILITY TO DISABLE PINGING
-
- Resolved
-
Recovering serveres try to retrieve clients' information from last_rcvd files and see if they've been connected. And next, the serveres send callback pings to the clients in order to make them reconnect.
functions are included in it.
this is a basic recovering motion in FEFS, thought lots of trivial
oh, and I want you to know one thing, that is ... when we handle a large system like K, ping often eats up lnet resources such as credit
... I'm not so good there, thought ... so I think you'll need a mesure against the problem. And which is why we restrict the retry number of times of callback ping to 5 times.