Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10114

Feasibility of increasing upper limit of maximum HSM backends registered with MDT

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      Hello,
      As mentioned in Xue Wei's LAD'17 talk: https://www.eofs.eu/_media/events/lad17/08_li_xi_lcoc_lad_2017.pdf (LCOC: Lustre Cache on Client based on SSD – Xue Wei, NSCC-Wuxi and Li Xi, DDN), there is currently an upper limit of 32 HSM archives that can be registered.

      We also have a potential use-case for HSM that would greatly benefit from being able to increase this threshold, for example we would then be able to allocate a HSM archive for an individual customer's project, thus being able to colocate their HSM archived files more logically on our particular HSM backend (a tape filesystem), which would greatly improve our ability to restore large numbers of files.

      My question is just is this upper limit something that is relatively simple to increase without impacting much else?

      I've emailed Li Xi from DDN who was also listed in the talk (I haven't managed to find the email address of Xue Wei from NSCC-Wuxi) if he or Xue could comment on this too, so I'll update this if I hear back from them.

      Thanks,
      Matt

      Attachments

        Issue Links

          Activity

            [LU-10114] Feasibility of increasing upper limit of maximum HSM backends registered with MDT
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32197/
            Subject: LU-10114 hsm: increase upper limit of maximum HSM backends registered with MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3bfb6107ba4e92d8aa02e842502bc44bac7b8b43

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32197/ Subject: LU-10114 hsm: increase upper limit of maximum HSM backends registered with MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3bfb6107ba4e92d8aa02e842502bc44bac7b8b43

            John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33631
            Subject: LU-10114 hsm: noop chaneg for introp baselining
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2b02bf4747f0e6cb5d523471a4a6df226c7a5e86

            gerrit Gerrit Updater added a comment - John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33631 Subject: LU-10114 hsm: noop chaneg for introp baselining Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2b02bf4747f0e6cb5d523471a4a6df226c7a5e86

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32806/
            Subject: LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1c7e7d1243f78c72210a0ba3c22d5c84838a416e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32806/ Subject: LU-10114 hsm: add OBD_CONNECT2_ARCHIVE_ID_ARRAY to pass archive_id lists in array Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1c7e7d1243f78c72210a0ba3c22d5c84838a416e

            Teddy Zheng (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32197
            Subject: LU-10114 hsm: increasing upper limit of maximum HSM backends registered with MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 733344c1d871ae7925822de8135b32900ea2c776

            gerrit Gerrit Updater added a comment - Teddy Zheng (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32197 Subject: LU-10114 hsm: increasing upper limit of maximum HSM backends registered with MDT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 733344c1d871ae7925822de8135b32900ea2c776
            jhammond John Hammond added a comment -

            Hi Li Xi, could you make Xue Wei aware of this ticket?

            jhammond John Hammond added a comment - Hi Li Xi, could you make Xue Wei aware of this ticket?

            Also, the copytool registration to the kernel passes only a 32-bit lk_data mask to indicate which archives it is in charge of:

            struct lustre_kernelcomm {
                    __u32 lk_wfd;
                    __u32 lk_rfd;
                    __u32 lk_uid;
                    __u32 lk_group;
                    __u32 lk_data;
                    __u32 lk_flags;
            } __attribute__((packed));
            
            adilger Andreas Dilger added a comment - Also, the copytool registration to the kernel passes only a 32-bit lk_data mask to indicate which archives it is in charge of: struct lustre_kernelcomm { __u32 lk_wfd; __u32 lk_rfd; __u32 lk_uid; __u32 lk_group; __u32 lk_data; __u32 lk_flags; } __attribute__((packed));

            John, typically it should be possible to increase the size of a buffer without breaking backward compatibility, so long as old clients aren't expected to access any of the fields beyond the old size of the buffer. Unfortunately, the MDS_HSM_ARCHIVE RPC is used between the client and MDS, so there needs to be some compatibility in place (probably struct obd_connect_data feature flag and max archive count) to negotiate the limits between them.

            I also see that the archive ID is used as a 32-bit value in a few structs, in particular those in lustre_user.h:

            struct hsm_user_state {
                    /** Current HSM states, from enum hsm_states. */
                    __u32                   hus_states;
                    __u32                   hus_archive_id;
                    /**  The current undergoing action, if there is one */
                    __u32                   hus_in_progress_state;
                    __u32                   hus_in_progress_action;
                    struct hsm_extent       hus_in_progress_location;
                    char                    hus_extended_info[];
            };
            
            struct hsm_request {
                    __u32 hr_action;        /* enum hsm_user_action */
                    __u32 hr_archive_id;    /* archive id, used only with HUA_ARCHIVE */
                    __u64 hr_flags;         /* request flags */
                    __u32 hr_itemcount;     /* item count in hur_user_item vector */
                    __u32 hr_data_len;
            };
            
            struct hsm_action_list {
                    __u32 hal_version;
                    __u32 hal_count;       /* number of hai's to follow */
                    __u64 hal_compound_id; /* returned by coordinator */
                    __u64 hal_flags;
                    __u32 hal_archive_id; /* which archive backend */
                    __u32 padding1;
                    char  hal_fsname[0];   /* null-terminated */
                    /* struct hsm_action_item[hal_count] follows, aligned on 8-byte
                       boundaries. See hai_zero */
            } __attribute__((packed));
            
            struct hsm_user_import {
                    __u64           hui_size;
                    __u64           hui_atime;
                    __u64           hui_mtime;
                    __u32           hui_atime_ns;
                    __u32           hui_mtime_ns;
                    __u32           hui_uid;
                    __u32           hui_gid;
                    __u32           hui_mode;
                    __u32           hui_archive_id;
            };
            

            but these all appear to be used as integer values and not bitmaps, so they should be OK without any changes.

            Also, sanity-hsm.sh test_50() and test_51() are testing that up to 32 archives can be used, so this test would need to be updated if we allow more archives.

            adilger Andreas Dilger added a comment - John, typically it should be possible to increase the size of a buffer without breaking backward compatibility, so long as old clients aren't expected to access any of the fields beyond the old size of the buffer. Unfortunately, the MDS_HSM_ARCHIVE RPC is used between the client and MDS, so there needs to be some compatibility in place (probably struct obd_connect_data feature flag and max archive count) to negotiate the limits between them. I also see that the archive ID is used as a 32-bit value in a few structs, in particular those in lustre_user.h : struct hsm_user_state { /** Current HSM states, from enum hsm_states. */ __u32 hus_states; __u32 hus_archive_id; /** The current undergoing action, if there is one */ __u32 hus_in_progress_state; __u32 hus_in_progress_action; struct hsm_extent hus_in_progress_location; char hus_extended_info[]; }; struct hsm_request { __u32 hr_action; /* enum hsm_user_action */ __u32 hr_archive_id; /* archive id, used only with HUA_ARCHIVE */ __u64 hr_flags; /* request flags */ __u32 hr_itemcount; /* item count in hur_user_item vector */ __u32 hr_data_len; }; struct hsm_action_list { __u32 hal_version; __u32 hal_count; /* number of hai's to follow */ __u64 hal_compound_id; /* returned by coordinator */ __u64 hal_flags; __u32 hal_archive_id; /* which archive backend */ __u32 padding1; char hal_fsname[0]; /* null -terminated */ /* struct hsm_action_item[hal_count] follows, aligned on 8- byte boundaries. See hai_zero */ } __attribute__((packed)); struct hsm_user_import { __u64 hui_size; __u64 hui_atime; __u64 hui_mtime; __u32 hui_atime_ns; __u32 hui_mtime_ns; __u32 hui_uid; __u32 hui_gid; __u32 hui_mode; __u32 hui_archive_id; }; but these all appear to be used as integer values and not bitmaps, so they should be OK without any changes. Also, sanity-hsm.sh test_50() and test_51() are testing that up to 32 archives can be used, so this test would need to be updated if we allow more archives.
            jhammond John Hammond added a comment -

            Unfortunately the limit of 32 archives is part of the wire protocol:

            struct req_msg_field RMF_MDS_HSM_ARCHIVE =
                    DEFINE_MSGF("hsm_archive", 0,
                                sizeof(__u32), lustre_swab_generic_32s, NULL);
            EXPORT_SYMBOL(RMF_MDS_HSM_ARCHIVE);
            

            So it can be changed but it will take some time for this to be seen in a production release.

            jhammond John Hammond added a comment - Unfortunately the limit of 32 archives is part of the wire protocol: struct req_msg_field RMF_MDS_HSM_ARCHIVE = DEFINE_MSGF( "hsm_archive" , 0, sizeof(__u32), lustre_swab_generic_32s, NULL); EXPORT_SYMBOL(RMF_MDS_HSM_ARCHIVE); So it can be changed but it will take some time for this to be seen in a production release.

            People

              Teddy Teddy
              mrb Matt Rásó-Barnett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: