Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3647 HSM _not only_ small fixes and to do list goes here
  3. LU-3882

mounting a Lustre FS when already running an HSM CT causes the new mount to register as a CT

Details

    • Technical task
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 10080

    Description

      Due to the global KUC lists, seeing IMP_EVENT_ACTIVE on any MDC import will cause any already registered CT archive masks to be registered with the MDT behind that import.

      mdc_import_event(..., ..., imp, IMP_EVENT_ACTIVE)
          mdc_kuc_reregister(imp)
              libcfs_kkuc_group_foreach(KUC_GRP_HSM, mdc_hsm_ct_reregister, imp)
                                               (void *)imp)
                  cfs_list_for_each_entry(reg, ... KUC_GRP_HSM, ...)
                      mdc_hsm_ct_reregister(reg->kr_reg = archives, imp)
                          mdc_ioc_hsm_ct_register(imp, archives)
                              /* Send MDS_HSM_CT_REGISTER. */
      

      Attachments

        Activity

          [LU-3882] mounting a Lustre FS when already running an HSM CT causes the new mount to register as a CT
          pjones Peter Jones added a comment -

          Landed for 2.5.0

          pjones Peter Jones added a comment - Landed for 2.5.0

          Thanks a lot John.

          Patch is at http://review.whamcloud.com/7612

          hdoreau Henri Doreau (Inactive) added a comment - Thanks a lot John. Patch is at http://review.whamcloud.com/7612
          jhammond John Hammond added a comment -

          I agree with Thomas' approach but think it can be refined somewhat. Here is what I suggest:

          1. In struct struct kkuc_reg change kr_data from __u32 to void *.
          2. Define struct kkuc_ct_data as follows to hold some magic, an obd_uuid, and the __u32 archive formerly placed in kr_data.
          3. In lmv_hsm_ct_register() allocate a kkuc_ct_data and initialize it with the UUID of the LMV obd and with the passed in archives.
          4. In libcfs_kkuc_group_rem() add a void **data parameter to receive the data on removal.
          5. In lmv_hsm_ct_unregister() recover the kkuc_ct_data and free it.
          6. Adjust mdc_hsm_ct_reregister() to use kkuc_ct_data and check its UUID against that of the MDC import.
          jhammond John Hammond added a comment - I agree with Thomas' approach but think it can be refined somewhat. Here is what I suggest: In struct struct kkuc_reg change kr_data from __u32 to void *. Define struct kkuc_ct_data as follows to hold some magic, an obd_uuid, and the __u32 archive formerly placed in kr_data. In lmv_hsm_ct_register() allocate a kkuc_ct_data and initialize it with the UUID of the LMV obd and with the passed in archives. In libcfs_kkuc_group_rem() add a void **data parameter to receive the data on removal. In lmv_hsm_ct_unregister() recover the kkuc_ct_data and free it. Adjust mdc_hsm_ct_reregister() to use kkuc_ct_data and check its UUID against that of the MDC import.

          Thomas is off for the next days. I can work on a patch if needed.
          We would appreciate comments/suggestions from Intel on the proposed approach though.

          hdoreau Henri Doreau (Inactive) added a comment - Thomas is off for the next days. I can work on a patch if needed. We would appreciate comments/suggestions from Intel on the proposed approach though.

          Thomas,
          Are you planning to submit a patch for this?

          jlevi Jodi Levi (Inactive) added a comment - Thomas, Are you planning to submit a patch for this?

          It could be fixed like this:

          • add mount point identifier as new argument of libcfs_ukuc_start() to put it into kkuc_reg structure.
          • add mount point identifier as new argument of libcfs_kkuc_group_foreach() so it only runs the re-registration for copytools who registered on this mount point.

          I see this comment in kuc:

          /* Broadcast groups are global across all mounted filesystems;
           * i.e. registering for a group on 1 fs will get messages for that
           * group from any fs */
          

          And indeed it appears that a copytool registered for 1 filesystem will get requests for other filesystems.
          in mdc:

                  /* Broadcast to HSM listeners */
                  rc = libcfs_kkuc_group_put(KUC_GRP_HSM, lh);
          

          The only check is done in the copytool code itself, based on hsm action list contents:

           if (strcmp(hal->hal_fsname, fs_name) != 0) {
                   CT_ERROR("'%s' invalid fs name, expecting: %s\n",
                            hal->hal_fsname, fs_name);
          

          It would be better to filter it before, in the kuc layer, by adding the mnt point parameter to libcfs_kkuc_group_put() too.

          leibovici-cea Thomas LEIBOVICI - CEA (Inactive) added a comment - - edited It could be fixed like this: add mount point identifier as new argument of libcfs_ukuc_start() to put it into kkuc_reg structure. add mount point identifier as new argument of libcfs_kkuc_group_foreach() so it only runs the re-registration for copytools who registered on this mount point. I see this comment in kuc: /* Broadcast groups are global across all mounted filesystems; * i.e. registering for a group on 1 fs will get messages for that * group from any fs */ And indeed it appears that a copytool registered for 1 filesystem will get requests for other filesystems. in mdc: /* Broadcast to HSM listeners */ rc = libcfs_kkuc_group_put(KUC_GRP_HSM, lh); The only check is done in the copytool code itself, based on hsm action list contents: if (strcmp(hal->hal_fsname, fs_name) != 0) { CT_ERROR("'%s' invalid fs name, expecting: %s\n", hal->hal_fsname, fs_name); It would be better to filter it before, in the kuc layer, by adding the mnt point parameter to libcfs_kkuc_group_put() too.

          People

            jhammond John Hammond
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: