[LU-2898] More timely notification of clients in case of eviction - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
6985

Description

There have been periodic complaints about lustre not really knowing when it was evicted from a server node, as this could only be known in case an RPC is sent.
Frequently this would be handled by a periodic ping, but with this functionality being turned down to happen in rarer cases, it more and more converts into the case of an app initiating an RPC and being evicted all of a sudden due to an eviction that has happened quite a while ago.

As such we probably need a somewhat better way of notifying clients of their eviction so that they can reconnect somewhat more eagerly and with a bit less damage to whatever it is that might be running in userspace.

Attachments

Issue Links

is related to

LU-2467 ABILITY TO DISABLE PINGING

Resolved

Activity

[LU-2898] More timely notification of clients in case of eviction

Hiroya Nozaki (Inactive) added a comment - 14/Mar/13 3:38 AM - edited

Recovering serveres try to retrieve clients' information from last_rcvd files and see if they've been connected. And next, the serveres send callback pings to the clients in order to make them reconnect.
this is a basic recovering motion in FEFS, thought lots of trivial functions are included in it.

oh, and I want you to know one thing, that is ... when we handle a large system like K, ping often eats up lnet resources such as credit ... I'm not so good there, thought ... so I think you'll need a mesure against the problem. And which is why we restrict the retry number of times of callback ping to 5 times.

Hiroya Nozaki (Inactive) added a comment - 14/Mar/13 3:38 AM - edited Recovering serveres try to retrieve clients' information from last_rcvd files and see if they've been connected. And next, the serveres send callback pings to the clients in order to make them reconnect. this is a basic recovering motion in FEFS, thought lots of trivial functions are included in it. oh, and I want you to know one thing, that is ... when we handle a large system like K, ping often eats up lnet resources such as credit ... I'm not so good there, thought ... so I think you'll need a mesure against the problem. And which is why we restrict the retry number of times of callback ping to 5 times.

Robert Read added a comment - 14/Mar/13 2:04 AM

I see. Well, that's not ideal, but at least we know what the reason is.

BTW, if the clients are not pinging, how did they all know to reconnect to the recovering server?

Robert Read added a comment - 14/Mar/13 2:04 AM I see. Well, that's not ideal, but at least we know what the reason is. BTW, if the clients are not pinging, how did they all know to reconnect to the recovering server?

Hiroya Nozaki (Inactive) added a comment - 14/Mar/13 1:18 AM - edited

Hi, Robert.

I've often seen lots of clients are evicted when server recovering. It appeaers that a server cannot catch up with a great number of coming reconnect reqs, about 90k * (target-disks).
As a reasult, clients whose recon reqs haven't been handled by the server are evicted.

Hiroya Nozaki (Inactive) added a comment - 14/Mar/13 1:18 AM - edited Hi, Robert. I've often seen lots of clients are evicted when server recovering. It appeaers that a server cannot catch up with a great number of coming reconnect reqs, about 90k * (target-disks). As a reasult, clients whose recon reqs haven't been handled by the server are evicted.

Robert Read added a comment - 04/Mar/13 9:14 PM

I agree that reconnects are probably valid, but I'm not sure all evicts are necessarily valid or unavoidable. If they are occurring frequently then we should at least try to find out what is causing them.

Robert Read added a comment - 04/Mar/13 9:14 PM I agree that reconnects are probably valid, but I'm not sure all evicts are necessarily valid or unavoidable. If they are occurring frequently then we should at least try to find out what is causing them.

Oleg Drokin added a comment - 04/Mar/13 7:26 PM

There might be many reasons for reconnect, I guess. All of them are valid one way or another. Like one-off AST loss or such.

Oleg Drokin added a comment - 04/Mar/13 7:26 PM There might be many reasons for reconnect, I guess. All of them are valid one way or another. Like one-off AST loss or such.

Robert Read added a comment - 04/Mar/13 11:42 AM - edited

My first thought was that this does seem like a special case of imperative recovery, but limited to a specific client, and we could call it "imperative reconnect." But perhaps a simpler ldlm callback is sufficient since if there is a network split the client wouldn't be able to reconnect anyway.

Do we understand why these seemingly idle clients are being evicted in the first place? Is there an issue there?

Robert Read added a comment - 04/Mar/13 11:42 AM - edited My first thought was that this does seem like a special case of imperative recovery, but limited to a specific client, and we could call it "imperative reconnect." But perhaps a simpler ldlm callback is sufficient since if there is a network split the client wouldn't be able to reconnect anyway. Do we understand why these seemingly idle clients are being evicted in the first place? Is there an issue there?

Oleg Drokin added a comment - 03/Mar/13 11:59 PM

Fujitsu as the first site to disable pinging in most of the cases hit this esp. often so they created a patch to avert this issue that makes servers to notify MGS of eviction and MGS in turn would send messages to clients to come in contact with servers and reconnect as needed (sort of like reverse imperative recovery I guess).
The contributed patch against fefs is here http://review.whamcloud.com/#change,5457 (And is not directly applicable to the master tree, but gives an idea of how they did it).

I imagine it might have been much easier to just send a specially crafted ldlm callback to let it know we are evicting him (and this would require a lot less infrastructure changes), but that would not handle a case of severed communication between this particular server and client where as MGS connectivity of both would remain unaffected.

Additionally since the case outlined as most severe by Fujitsu is that of a new application started, there is a possible workaround of doing "df" before a new job starts from whatever job scheduling framework might be there, but still there is a feeling that this case should be handled more transparently inside of Lustre.

Oleg Drokin added a comment - 03/Mar/13 11:59 PM Fujitsu as the first site to disable pinging in most of the cases hit this esp. often so they created a patch to avert this issue that makes servers to notify MGS of eviction and MGS in turn would send messages to clients to come in contact with servers and reconnect as needed (sort of like reverse imperative recovery I guess). The contributed patch against fefs is here http://review.whamcloud.com/#change,5457 (And is not directly applicable to the master tree, but gives an idea of how they did it). I imagine it might have been much easier to just send a specially crafted ldlm callback to let it know we are evicting him (and this would require a lot less infrastructure changes), but that would not handle a case of severed communication between this particular server and client where as MGS connectivity of both would remain unaffected. Additionally since the case outlined as most severe by Fujitsu is that of a new application started, there is a possible workaround of doing "df" before a new job starts from whatever job scheduling framework might be there, but still there is a feeling that this case should be handled more transparently inside of Lustre.

People

Assignee:: WC Triage

Reporter:: Oleg Drokin

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 03/Mar/13 11:51 PM

Updated:: 25/Jul/24 1:45 PM