[LU-3891] leases for HSM - some questions Created: 05/Sep/13 Updated: 02/Dec/16 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Vitaly Fertman | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 10148 |
| Description |
|
AFAICS, some lease code landed for HSM needs. On eviction, locks are cancelled on MDS and client. However, a new lease may conflict with open files, but after client eviction and later re-connect, client does not re-open files, while they are still opened on the client and it is able to proceed with its IO. However, HSM has a layout lock as well, which is supposed to block such new IO. if not, lease lock gives no guarantee for recently evicted clients. The 2nd problem is that the evicted state has a latency being propagated from MDS to client, when client does not know it has connection problems while it is already evicted - could be up to obd_timeout which could be also pretty long. layout lock will not help here. The solution could be the same as with SOM - just deny all the HSM releases for X*obd_timeouts period after the last eviction, to be sure clients are aware about their evictions and have cancelled layout locks. are these lease lock issues known and somehow resolved? |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 06/Sep/13 ] |
|
I thought about this problem, heh. The lease implementation is okay for HSM. Check out the code mdt_hsm_release() you will find that the lease_broken is actually checked on the MDT side. So the HSM release operation is performed as follows: 1. open + lease the file |
| Comment by Vitaly Fertman [ 10/Sep/13 ] |
|
this does not answer the original question, because the lease lock will be there but will still guarantee nothing. I looked at mdt_hsm_release() and see it takes an exclusive layout lock, what is good as it covers the 1st issue. |
| Comment by Jinshan Xiong (Inactive) [ 24/Sep/13 ] |
|
Indeed. The problem is that when the MDT grants an open lease to the release client, an evicted client may still keep writing to the file, so that the file may lose some status after release. But this problem should be minor, because the opening file on the evicted file will be returned with EIO eventually after the release because OST objects have already disappeared. |
| Comment by Vitaly Fertman [ 24/Sep/13 ] |
|
AFAICS, the following is possible:
whereas the window size is relatively small between the check and release, it seems still possible, and it will lead to data loss |
| Comment by Vitaly Fertman [ 02/Dec/16 ] |
|
summarising the current state of the HSM locking, the main question is if the copy is valid after the release, the whole logic can be viewed starting from ll_hsm_release:
also, the release happens ~2weeks later or more after the last access. therefore, everything is protected if:
however it is not protected even with the 2 weeks delay if a new IO and an eviction have happened just before the release and:
possible improvements could be: |