Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7668

permanently remove deactivated OSTs from configuration log

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      When an OST is permanently removed from the filesystem, the current process is to store a conf_param that marks the OSC permanently inactive in the configuration log, but it doesn't remove the actual OSC records from the config llog. That is needed for clients already mounting the filesystem so that they don't wait for the OST to be recovered, since only new records added to the end of config llog are processed by already-mounted clients.

      However, for clients newly mounting the filesystem it may be desirable, if the OST is permanently deleted, to remove the OST record from the configuration log completely so that new client mounts don't even try to connect to it. This can be done relatively easily by cancelling the llog record(s) for the removed OST(s) so that the client doesn't process them at all.

      The other area that may need fixing is lfs df, since it currently iterates over OST and MDT indices sequentially until it gets a -ENODEV return code that indicates no more OST/MDT devices are available. This will result in temporarily inactive OSTs to be printed, since the admin should know when there are offline OSTs, but it should not result in unconfigured OSTs being printed. It looks like -EAGAIN being returned from IOC_OBD_STATFS will result in the OST/MDT being silently skipped as we would want in this case. This needs to be verified.

      It looks like lov_iocontrol() is returning -EAGAIN correctly, but lmv_iocontrol() is incorrectly returning -ENODATA for tgt == NULL instead of -EAGAIN, and not handling the OBD_STATFS_NODELAY flag in uarg at all.

      Attachments

        Issue Links

          Activity

            [LU-7668] permanently remove deactivated OSTs from configuration log

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/
            Subject: LU-7668 admin: remove OST from config logs
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: c7a2886198b7273165d90d5054e7da669ba97843

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/ Subject: LU-7668 admin: remove OST from config logs Project: doc/manual Branch: master Current Patch Set: Commit: c7a2886198b7273165d90d5054e7da669ba97843

            Hello Stephane,

            I think that the LU-14090 describes your issue.

            The patch https://review.whamcloud.com/40448 ("LU-14090 mgs: no local logs flag") add a tunefs.lustre option to ignore local log.

            eaujames Etienne Aujames added a comment - Hello Stephane, I think that the LU-14090 describes your issue. The patch https://review.whamcloud.com/40448 (" LU-14090 mgs: no local logs flag") add a tunefs.lustre option to ignore local log.

            I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as LU-9699. This cached version didn't even have the latest OSTs that we added to the filesystem. I'm wondering if there is a way to make sure the MDTs have up-to-date versions of their config, especially after adding or removing OSTs.

            sthiell Stephane Thiell added a comment - I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as LU-9699 . This cached version didn't even have the latest OSTs that we added to the filesystem. I'm wondering if there is a way to make sure the MDTs have up-to-date versions of their config, especially after adding or removing OSTs.

            Etienne, I haven't looked into the details, but it would leave an unused network connection to that OSS, which may cause spurious connection attempts, though it may be the connection is never used if there are no targets on it. Definitely worthwhile to test out at least.

            adilger Andreas Dilger added a comment - Etienne, I haven't looked into the details, but it would leave an unused network connection to that OSS, which may cause spurious connection attempts, though it may be the connection is never used if there are no targets on it. Definitely worthwhile to test out at least.

            Hello Andreas,

            What are the consequences to keep the "add_uuid" records when removing all the OSTs of a node?

            Should we add an option to lctl del_ost" to remove "add_uuid" records if all the OSTs of a node have been deleted?

            eaujames Etienne Aujames added a comment - Hello Andreas, What are the consequences to keep the "add_uuid" records when removing all the OSTs of a node? Should we add an option to lctl del_ost" to remove "add_uuid" records if all the OSTs of a node have been deleted?

            Essentially, yes. The llog_cancel command doesn't actually remove the records from the config log, but they are marked as "already processed" so they are skipped by the client and MDS during mount.

            adilger Andreas Dilger added a comment - Essentially, yes. The llog_cancel command doesn't actually remove the records from the config log, but they are marked as "already processed" so they are skipped by the client and MDS during mount.

            Hello Andreas,

            Instead of writing the Lustre Configuration Logs using "tunefs.lustre --writeconf", we could proceed using llog_cancel to completely remove OST from the configuration and get the same results for our case.

            Am I understanding properly here ?

             

            castagnede Cedric Castagnede (Inactive) added a comment - Hello Andreas, Instead of writing the Lustre Configuration Logs using "tunefs.lustre --writeconf", we could proceed using llog_cancel to completely remove OST from the configuration and get the same results for our case. Am I understanding properly here ?  

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41453
            Subject: LU-7668 admin: remove OST from config logs
            Project: doc/manual
            Branch: master
            Current Patch Set: 1
            Commit: 4e66495d8bc07dc7714e3f300c06b9390d3271c0

            adilger Andreas Dilger added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41453 Subject: LU-7668 admin: remove OST from config logs Project: doc/manual Branch: master Current Patch Set: 1 Commit: 4e66495d8bc07dc7714e3f300c06b9390d3271c0

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41454
            Subject: LU-7668 admin: reference the del_ost command
            Project: doc/manual
            Branch: master
            Current Patch Set: 1
            Commit: eae54ec42263b49de8358706d76cdcfd3d05cf8f

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41454 Subject: LU-7668 admin: reference the del_ost command Project: doc/manual Branch: master Current Patch Set: 1 Commit: eae54ec42263b49de8358706d76cdcfd3d05cf8f

            Stephane Thiell (sthiell@stanford.edu) uploaded a new patch: https://review.whamcloud.com/41449
            Subject: LU-7668 utils: add lctl del_ost
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 85f353cefe428eb0c9c0e183f3475853a6324523

            gerrit Gerrit Updater added a comment - Stephane Thiell (sthiell@stanford.edu) uploaded a new patch: https://review.whamcloud.com/41449 Subject: LU-7668 utils: add lctl del_ost Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 85f353cefe428eb0c9c0e183f3475853a6324523
            adilger Andreas Dilger added a comment - - edited

            To permanently remove one or more OSTs from the configuration logs, there are two possible approaches.

            • rewrite the configuration with "tunefs.lustre --writeconf" on all the filesystem targets, and then the regenerated config llog will not even list the OSTs that are not connected. This has the benefit of being relatively straight forward to do, but needs a filesystem outage to unmount, run --writeconf, and remount all of the remaining the targets.
            • use "lctl --device MGS llog_print $fsname-client" (and also "... $fsname-MDTxxxx" for all the MDTs) to list all attach, setup, add_osc, add_pool, and other records related to the removed OST(s), as well as potentially add_uuid records for the removed OSS nodes (if any):
              mgs# lctl --device MGS llog_print testfs-client | egrep "192.168.10.99@tcp|OST0003"
              - { index: 135, event: add_uuid, nid: 192.168.10.99@tcp(0x20000c0a80a63), node: 192.168.10.99@tcp }
              - { index: 136, event: attach, device: testfs-OST0003-osc, type: osc, UUID: testfs-clilov_UUID }
              - { index: 137, event: setup, device: testfs-OST0003-osc, UUID: testfs-OST0003_UUID, node: 192.168.10.99@tcp }
              - { index: 138, event: add_osc, device: testfs-clilov, ost: testfs-OST0003_UUID, index: 3, gen: 1 }
              
            • use "lctl --device MGS llog_cancel $fsname-client --i <index>" for each of those OST records in each of the config llogs to disable the processing, e.g. for each of the $fsname-client and $fsname-MDTxxxx config logs the appropriate record indices need to be cancelled (they will be different between the client and MDTs):
              mgs# lctl --device MGS llog_cancel testfs-client -i 138
              mgs# lctl --device MGS llog_cancel testfs-client -i 137
              mgs# lctl --device MGS llog_cancel testfs-client -i 136
              

            The second option is a bit more effort to do today, but could be done while the filesystem is mounted as it only affects records that the mounted clients have already processed. The next time that a client or MDT are mounted, they should skip those OST records completely, and not try to connect to them at all.

            It would definitely be useful to have a wrapper script/command (something like "lctl [--device MGS] del_ost <$fsname-OSTxxxx>") to do the llog_cancel commands with a minimum of user interaction. For safety, it would be best to cancel the config records from the higher to lower index, so that there are no errors if a client mounts in the middle of the changes.

            adilger Andreas Dilger added a comment - - edited To permanently remove one or more OSTs from the configuration logs, there are two possible approaches. rewrite the configuration with " tunefs.lustre --writeconf " on all the filesystem targets, and then the regenerated config llog will not even list the OSTs that are not connected. This has the benefit of being relatively straight forward to do, but needs a filesystem outage to unmount, run --writeconf , and remount all of the remaining the targets. use " lctl --device MGS llog_print $fsname-client " (and also " ... $fsname-MDTxxxx " for all the MDTs) to list all attach , setup , add_osc , add_pool , and other records related to the removed OST(s), as well as potentially add_uuid records for the removed OSS nodes (if any): mgs# lctl --device MGS llog_print testfs-client | egrep "192.168.10.99@tcp|OST0003" - { index: 135, event: add_uuid, nid: 192.168.10.99@tcp(0x20000c0a80a63), node: 192.168.10.99@tcp } - { index: 136, event: attach, device: testfs-OST0003-osc, type: osc, UUID: testfs-clilov_UUID } - { index: 137, event: setup, device: testfs-OST0003-osc, UUID: testfs-OST0003_UUID, node: 192.168.10.99@tcp } - { index: 138, event: add_osc, device: testfs-clilov, ost: testfs-OST0003_UUID, index: 3, gen: 1 } use " lctl --device MGS llog_cancel $fsname-client --i <index> " for each of those OST records in each of the config llogs to disable the processing, e.g. for each of the $fsname-client and $fsname-MDTxxxx config logs the appropriate record indices need to be cancelled (they will be different between the client and MDTs): mgs# lctl --device MGS llog_cancel testfs-client -i 138 mgs# lctl --device MGS llog_cancel testfs-client -i 137 mgs# lctl --device MGS llog_cancel testfs-client -i 136 The second option is a bit more effort to do today, but could be done while the filesystem is mounted as it only affects records that the mounted clients have already processed. The next time that a client or MDT are mounted, they should skip those OST records completely, and not try to connect to them at all. It would definitely be useful to have a wrapper script/command (something like " lctl [--device MGS] del_ost <$fsname-OSTxxxx> ") to do the llog_cancel commands with a minimum of user interaction. For safety, it would be best to cancel the config records from the higher to lower index, so that there are no errors if a client mounts in the middle of the changes.

            People

              sthiell Stephane Thiell
              adilger Andreas Dilger
              Votes:
              2 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: