[2/5/14, 9:19:26 AM] wangdi: if we just evict the LWP, it might cause some in-flight RPC failed
[2/5/14, 9:19:40 AM] Johann Lombardi: yup
[2/5/14, 9:19:41 AM] wangdi: which then cause application error during recovery
[2/5/14, 9:19:44 AM] Johann Lombardi: yes
[2/5/14, 9:19:54 AM] Johann Lombardi: that's what the quota code expects
[2/5/14, 9:20:03 AM] wangdi: which then cause some test failure,
[2/5/14, 9:20:13 AM] Johann Lombardi: all the states associated with the previous "connection" should go away
[2/5/14, 9:20:47 AM] Johann Lombardi: could you please remind me how you use LWP?
[2/5/14, 9:20:59 AM] Johann Lombardi: for seq allocation mostly, right?
[2/5/14, 9:21:15 AM] wangdi: no, mostly for fld(seq) lookup
[2/5/14, 9:21:35 AM] wangdi: seq allocation is rare actually
[2/5/14, 9:21:38 AM] Johann Lombardi: ah, then the lack of recovery should not be a problem
[2/5/14, 9:21:45 AM] Johann Lombardi: you can just resend the request yourself
[2/5/14, 9:22:06 AM] wangdi: that would complicate thing
[2/5/14, 9:22:24 AM] wangdi: I am thinking some easy way. :)
[2/5/14, 9:22:38 AM] Johann Lombardi: well, the purpose of lwp is to be lightweight
[2/5/14, 9:22:52 AM] Johann Lombardi: as such, it has no entry in last_rcvd
[2/5/14, 9:22:58 AM] Johann Lombardi: and gets evicted upon restart
[2/5/14, 9:23:07 AM] wangdi: I understand this.
[2/5/14, 9:23:14 AM] Johann Lombardi: it is like this by design
[2/5/14, 9:23:54 AM] wangdi: quota need this evict thing? or just design
[2/5/14, 9:24:01 AM] wangdi: if there are no replay
[2/5/14, 9:25:06 AM] wangdi: why do we need this eviction.
[2/5/14, 9:25:24 AM] Johann Lombardi: quota needs all outstanding requests associated with previous connection to fail, yes
[2/5/14, 9:25:31 AM] wangdi: I see
[2/5/14, 9:25:36 AM] wangdi: thanks
[2/5/14, 9:25:46 AM] Johann Lombardi: well, the eviction is a side effect of not being in the last_rcvd file
[2/5/14, 9:26:15 AM] Johann Lombardi: quota updates are not synchronous to disk and not replayed
[2/5/14, 9:26:28 AM] wangdi: if the eviction can chose not to fail those read-only in-flight RPC
[2/5/14, 9:26:32 AM] wangdi: that would be best
[2/5/14, 9:26:35 AM] Johann Lombardi: when the mdt restarts, it might have a different view of allocation
[2/5/14, 9:26:49 AM] Johann Lombardi: so we should restart from scratch each time
[2/5/14, 9:27:00 AM] Johann Lombardi: and forget about previous requests that did not complete
[2/5/14, 9:27:09 AM] Johann Lombardi: because there are not relevant any more
[2/5/14, 9:27:22 AM] Johann Lombardi: i guess we could change the behavior a bit
[2/5/14, 9:27:42 AM] Johann Lombardi: and not fail requests which are in the delayed list
[2/5/14, 9:28:16 AM] Johann Lombardi: quota could then set the no_delay flag (if not done already) to bypass this behavior
[2/5/14, 9:28:47 AM] Johann Lombardi: however, i guess it will require to patch ptlrpc no to fail those requests on eviction which will happen anyway since we have not entry in the last_rcvd file
[2/5/14, 9:28:50 AM] Johann Lombardi: what do you think?
[2/5/14, 9:29:31 AM] wangdi: the purpose of eviction is for the "replay" req, not for delay req in this case.
[2/5/14, 9:29:55 AM] wangdi: so probably change ptlrpc makes sense, but that would complicate the protocol
[2/5/14, 9:29:57 AM] wangdi: I hate that
[2/5/14, 9:30:00 AM] Johann Lombardi: well, requests from the delayed list will be failed too
[2/5/14, 9:30:08 AM] Johann Lombardi: in the case of eviction
[2/5/14, 9:30:24 AM] wangdi: yes, but it should not for lightweight connection
[2/5/14, 9:30:54 AM] Johann Lombardi: yes, so you could change this behavior and set rq_no_delay on quota request (against, if not done already)
[2/5/14, 9:31:12 AM] Johann Lombardi: yours will be handled
[2/5/14, 9:32:15 AM] wangdi: yes, probably
[2/5/14, 9:32:27 AM] wangdi: need think a bit. thanks for the discussion.
[2/5/14, 9:32:27 AM] Johann Lombardi: i don't see any other way to do that
[2/5/14, 9:32:44 AM] Johann Lombardi: except adding lwp to last_rcvd
[2/5/14, 9:32:52 AM] Johann Lombardi: which would defeat the original purpose
[2/5/14, 9:33:03 AM] Johann Lombardi: np
[2/5/14, 9:33:40 AM] wangdi: the other way is to resend the fld rpc as you said, but that would complicate thing
[2/5/14, 9:34:17 AM] Johann Lombardi: it might, indeed
[2/5/14, 10:32:30 AM] wangdi: set no_delay will cause quota RPC failed un any unhealthy connection. not just restart.
[2/5/14, 10:32:35 AM] wangdi: is it ok for quota?
[2/5/14, 10:48:13 AM] wangdi: hmm, we already set no_delay flag for quota request
[2/5/14, 10:48:27 AM] wangdi: so I guess this eviction is unnecessary anymore?
[2/5/14, 10:48:49 AM] wangdi: I will disable part of recovery-small test then
discussion on Skype