[LU-13920] HSM: hsm_actions are not processed after MDS failover Created: 24/Aug/20  Updated: 19/Jan/22  Resolved: 12/Oct/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Sergey Cheremencev Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14399 mount MDT takes very long with hsm en... Resolved
is related to LU-13651 Conditionally skip finding compatible... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After each MDS failover following message could be seen:

(mdt_coordinator.c:1116:mdt_hsm_cdt_start()) lustre-MDT0000: cannot take the layout locks needed for registered restore: -2  

This error means that coordinator doesn't process hsm_actions list after failover.
In several words the problem is caused by the race config llog processing during MDS mount. Config params that causes to start coordinator and handle hsm_actions list is processed before MDD that initializes hsm llog(mdd_prepare->mdd_hsm_actions_llog_init).

Above error message could be seen after sanity-hsm_407 that does MDS failover.



 Comments   
Comment by Etienne Aujames [ 27/Aug/20 ]

Hello,

Could you please give the version of LustreĀ  you use?

Comment by Sergey Cheremencev [ 27/Aug/20 ]

Hello,

It exists on the latest master(d7e6b6d2) - 2.13.

Comment by Gerrit Updater [ 24/Sep/20 ]

Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/40028
Subject: LU-13920 hsm: process hsm_actions only after mdd setup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 888e8dc6ea70b1fd54d8c88a02a127d3a850938c

Comment by Gerrit Updater [ 12/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40028/
Subject: LU-13920 hsm: process hsm_actions only after mdd setup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a558006b83dfe32798cce644aa888c37e805d50b

Comment by Peter Jones [ 12/Oct/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 19/Jan/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46207
Subject: LU-13920 hsm: process hsm_actions only after mdd setup
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: e0c811ffb5bdfb2e5bce785f25da0f60bf2f510b

Generated at Sat Feb 10 03:05:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.