[LU-11284] Full lock convert conflicts with HSM Created: 25/Aug/18  Updated: 06/Apr/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Unresolved Votes: 0
Labels: DoM2

Issue Links:
Related
is related to LU-5216 HSM: restore doesn't restart for new ... Reopened
is related to LU-10175 DoM:Full support for the LDLM lock co... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Full lock convert is not enabled still because of conflict with HSM. Tests 33-36 and 201 in sanity-hsm.sh are failing with lock convert. This looks related to LU-5216 and have to be investigated more.



 Comments   
Comment by Mikhail Pershin [ 14/Nov/19 ]

The latest patch from LU-11276 fixes this issue. In short, the problem was in HSM code which takes server EX LAYOUT lock for restore time. During that time client can try to get file attributes which in turn will cause ll_glimpse_size0() to issue new CR LAYOUT lock to the server and be blocked by restore process. There is workaround for this issue to prevent client from glimpsing the file being restored. It is done by returning MS_RESTORE state from server in MD attributes. The problem wasn't seen until lock convert because EX LAYOUT lock cancels client lock with all bits and usually LAYOUT bit was always combined with UPDATE bit, so client gets new attributes from server and updates restore state as needed. Lock convert keep UPDATE bit set and client uses local copy of attributes always and don't see MS_RESTORE state update. Strictly speaking this is HSM bug, because we can't always guarantee that LAYOUT is combined with UPDATE always, so problem may exists without lock convert as well.
Proposed solution is to take EX LAYOUT+UPDATE lock bits when restoring starts, so client states will be reset and client will need to ask attributes from server again to update restore state as expected. Note that UPDATE bit should be dropped right after EX lock is taken on server and only LAYOUT bit stay until restore will finish.

Comment by Gerrit Updater [ 15/Nov/19 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36766
Subject: LU-11284 ldlm: enable lock convert for all bits
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 345e4807cbf2fdc11dd891f4a9c1391c83c72c83

Comment by Mikhail Pershin [ 15/Nov/19 ]

There are more fixes needed, so I've split related fixes from LU-11276 patch and push them under this ticket.

Generated at Sat Feb 10 02:42:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.