Client eviction on lock callback timeout (LU-874)

[LU-918] ensure that BRW requests prevent lock timeout Created: 13/Dec/11  Updated: 30/Jan/12  Resolved: 30/Jan/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: Andreas Dilger Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15 strange slow IO messages and bad perf... Resolved
is related to LU-410 Performance concern with Shrink file_... Resolved
Rank (Obsolete): 10133

 Description   

To avoid lock timeouts in LU-874, one point of discussion was to ensure that BRW requests under a DLM extent lock ensure that the client does not get evicted, even under heavy OST load. The best option is if a client sends a BRW request under a DLM lock that the lock timeout is stopped entirely for that lock until the IO has completed. That would avoid overhead from continually refresh the DLM lock during operation, but may be more complex to implement than having the DLM lock timeout first check for BRW requests in the hpreq queue or in-progress by an OSS thread that would refresh it.



 Comments   
Comment by Andreas Dilger [ 13/Dec/11 ]

Add dependency on OSS read cache issues

Comment by Peter Jones [ 30/Jan/12 ]

Jinshan

Did you cover this already under one of your LU874 patches?

Peter

Comment by Jinshan Xiong (Inactive) [ 30/Jan/12 ]

I solved this problem by refreshing lock timeout each time.

Yes, I have ever had a patch to take locks out of waiting list if a covering RPC is coming. However, this way will have to modify the state of dlm lock, for example, to remember how many active RPCs existing. That's a stateful implementation.

After thinking about it, I decided to stay with current stateless implementation because of:
1. less possibilities of producing bugs;
2. lock timeout is rare event in the system, so performance shouldn't be a problem.

How do you think?

Comment by Andreas Dilger [ 30/Jan/12 ]

I think if we have a simple solution to the problem that works, then we don't need a complex solution to the problem. If there is nothing here left to be fixed, then this bug can be closed.

Generated at Sat Feb 10 01:11:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.