[LU-15108] Wrong changelog issued during hsm_restore Created: 15/Oct/21  Updated: 02/Nov/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Etienne Aujames Assignee: Etienne Aujames
Resolution: Unresolved Votes: 1
Labels: None
Environment:

VMs on Lustre master (2.14.54_98_g3db8e31)


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Here are the changelogs that we can see while restoring a hsm file:

30 10OPEN  10:43:26.912661570 2021.10.15 0xcb t=[0x200000402:0xa:0x0] j=lhsmtool_posix.0 ef=0xf u=0:0 nid=10.0.2.6@tcp m=rw-
31 10OPEN  10:43:26.920553291 2021.10.15 0x2 t=[0x200000402:0xa:0x0] j=lhsmtool_posix.0 ef=0xf u=0:0 nid=10.0.2.6@tcp m=-w-
32 16HSM   10:43:26.966607124 2021.10.15 0x0 t=[0x200000402:0x1:0x0] j=lhsmtool_posix.0 ef=0xf u=0:0 nid=10.0.2.6@tcp
33 12LYOUT 10:43:26.967682430 2021.10.15 0x0 t=[0x200000402:0x1:0x0] j=lhsmtool_posix.0 ef=0xf u=0:0 nid=10.0.2.6@tcp
34 16HSM   10:43:26.969154222 2021.10.15 0x80 t=[0x200000402:0x1:0x0] j=lhsmtool_posix.0 ef=0xf u=0:0 nid=10.0.2.6@tcp

The changelog 32 is wrong: flag: 0x0 -> err=0 hsm_event: HE_ARCHIVE hsm_flag=0
The changelog 34 is the right one: 0x80 -> err=0 hsm_event: HE_RESTORE hsm_flag=0

I think changelog 32 should be set flag to 0x280: err=0 hsm_event: HE_STATE hsm_flag=0

The wrong flag is positioned because of the following code (when updating hsm state attr):

static int mdd_hsm_update_locked(const struct lu_env *env,
                                 struct md_object *obj,
                                 const struct lu_buf *buf,
                                 struct thandle *handle,
                                 enum changelog_rec_flags *clf_flags)
{
.....
       /* Flags differ, set flags for the changelog that will be added */
        if (current_mh->mh_flags != new_mh->mh_flags) {                  <----------
                hsm_set_cl_event(clf_flags, HE_STATE);
                if (new_mh->mh_flags & HS_DIRTY)
                        hsm_set_cl_flags(clf_flags, CLF_HSM_DIRTY);
        }

When restoring a file, hsm xattr is updated by hsm_swap_layouts on the volatile (so no changelog are emitted there):

static int hsm_swap_layouts(struct mdt_thread_info *mti,
                            struct mdt_object *obj, const struct lu_fid *dfid,
                            struct md_hsm *mh_common)
{
.....
        mh_common->mh_flags &= ~(HS_RELEASED | HS_DIRTY);
        rc = mdt_hsm_attr_set(mti, dobj, mh_common);

So empty HSM changelog is issued by hsm_cdt_request_completed() -> mdt_hsm_attr_set() because hsm state is not modified by this function.



 Comments   
Comment by Gerrit Updater [ 15/Oct/21 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45263
Subject: LU-15108 hsm: HE_STATE should be issued when necessary
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c2e6e60b64847cc184b02ff22c53012453037803

Comment by Thomas Leibovici [ 28/Oct/21 ]

A side issue is: it is not really expected to have multiple changelog records for a common HSM operation.

  • "Archive" , "Release", "Restore", "Remove" and "Cancel" should trigger the corresponding HSM changelog ("HE_ARCHIVE", "HE_RELEASE", "HE_RESTORE", "HE_REMOVE", "HE_CANCEL")
  • Other hsm state changes should trigger "HE_STATE" (eg. a file is set "dirty", "lost"...)

Triggering HE_STATE for the common HSM operations (archive...) is redundant. From the Robinhood point of view, it will unnecessarily increase the changelog processing load, and trigger unnecessary "llapi_hsm_state_get()" operation on the filesystem to determine the new HSM state.
Calling llapi_hsm_state_get() is not necessary for other changelogs (archive etc). as these changelogs do include information about the resulting state in the changelog flags.

In the reported issue as well as in the proposed patch, these redundant HE_STATE records remain.

Comment by Etienne Aujames [ 02/Nov/21 ]

I have updated the patch above to take into account Thomas's remark.

Generated at Sat Feb 10 03:15:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.