HSM _not only_ small fixes and to do list goes here (LU-3647)

[LU-3882] mounting a Lustre FS when already running an HSM CT causes the new mount to register as a CT Created: 04/Sep/13  Updated: 02/Oct/13  Resolved: 02/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Technical task Priority: Major
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: HSM

Rank (Obsolete): 10080

 Description   

Due to the global KUC lists, seeing IMP_EVENT_ACTIVE on any MDC import will cause any already registered CT archive masks to be registered with the MDT behind that import.

mdc_import_event(..., ..., imp, IMP_EVENT_ACTIVE)
    mdc_kuc_reregister(imp)
        libcfs_kkuc_group_foreach(KUC_GRP_HSM, mdc_hsm_ct_reregister, imp)
                                         (void *)imp)
            cfs_list_for_each_entry(reg, ... KUC_GRP_HSM, ...)
                mdc_hsm_ct_reregister(reg->kr_reg = archives, imp)
                    mdc_ioc_hsm_ct_register(imp, archives)
                        /* Send MDS_HSM_CT_REGISTER. */


 Comments   
Comment by Thomas LEIBOVICI - CEA (Inactive) [ 05/Sep/13 ]

It could be fixed like this:

  • add mount point identifier as new argument of libcfs_ukuc_start() to put it into kkuc_reg structure.
  • add mount point identifier as new argument of libcfs_kkuc_group_foreach() so it only runs the re-registration for copytools who registered on this mount point.

I see this comment in kuc:

/* Broadcast groups are global across all mounted filesystems;
 * i.e. registering for a group on 1 fs will get messages for that
 * group from any fs */

And indeed it appears that a copytool registered for 1 filesystem will get requests for other filesystems.
in mdc:

        /* Broadcast to HSM listeners */
        rc = libcfs_kkuc_group_put(KUC_GRP_HSM, lh);

The only check is done in the copytool code itself, based on hsm action list contents:

 if (strcmp(hal->hal_fsname, fs_name) != 0) {
         CT_ERROR("'%s' invalid fs name, expecting: %s\n",
                  hal->hal_fsname, fs_name);

It would be better to filter it before, in the kuc layer, by adding the mnt point parameter to libcfs_kkuc_group_put() too.

Comment by Jodi Levi (Inactive) [ 06/Sep/13 ]

Thomas,
Are you planning to submit a patch for this?

Comment by Henri Doreau (Inactive) [ 09/Sep/13 ]

Thomas is off for the next days. I can work on a patch if needed.
We would appreciate comments/suggestions from Intel on the proposed approach though.

Comment by John Hammond [ 10/Sep/13 ]

I agree with Thomas' approach but think it can be refined somewhat. Here is what I suggest:

  1. In struct struct kkuc_reg change kr_data from __u32 to void *.
  2. Define struct kkuc_ct_data as follows to hold some magic, an obd_uuid, and the __u32 archive formerly placed in kr_data.
  3. In lmv_hsm_ct_register() allocate a kkuc_ct_data and initialize it with the UUID of the LMV obd and with the passed in archives.
  4. In libcfs_kkuc_group_rem() add a void **data parameter to receive the data on removal.
  5. In lmv_hsm_ct_unregister() recover the kkuc_ct_data and free it.
  6. Adjust mdc_hsm_ct_reregister() to use kkuc_ct_data and check its UUID against that of the MDC import.
Comment by Henri Doreau (Inactive) [ 11/Sep/13 ]

Thanks a lot John.

Patch is at http://review.whamcloud.com/7612

Comment by Peter Jones [ 02/Oct/13 ]

Landed for 2.5.0

Generated at Sat Feb 10 01:37:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.