Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7668

permanently remove deactivated OSTs from configuration log

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      When an OST is permanently removed from the filesystem, the current process is to store a conf_param that marks the OSC permanently inactive in the configuration log, but it doesn't remove the actual OSC records from the config llog. That is needed for clients already mounting the filesystem so that they don't wait for the OST to be recovered, since only new records added to the end of config llog are processed by already-mounted clients.

      However, for clients newly mounting the filesystem it may be desirable, if the OST is permanently deleted, to remove the OST record from the configuration log completely so that new client mounts don't even try to connect to it. This can be done relatively easily by cancelling the llog record(s) for the removed OST(s) so that the client doesn't process them at all.

      The other area that may need fixing is lfs df, since it currently iterates over OST and MDT indices sequentially until it gets a -ENODEV return code that indicates no more OST/MDT devices are available. This will result in temporarily inactive OSTs to be printed, since the admin should know when there are offline OSTs, but it should not result in unconfigured OSTs being printed. It looks like -EAGAIN being returned from IOC_OBD_STATFS will result in the OST/MDT being silently skipped as we would want in this case. This needs to be verified.

      It looks like lov_iocontrol() is returning -EAGAIN correctly, but lmv_iocontrol() is incorrectly returning -ENODATA for tgt == NULL instead of -EAGAIN, and not handling the OBD_STATFS_NODELAY flag in uarg at all.

      Attachments

        Issue Links

          Activity

            [LU-7668] permanently remove deactivated OSTs from configuration log

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50221/
            Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ba5346b050fb395844252a706d4dba2ef0e0d8dc

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50221/ Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS Project: fs/lustre-release Branch: master Current Patch Set: Commit: ba5346b050fb395844252a706d4dba2ef0e0d8dc

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50221
            Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ed0dc54cb946884ff2d5225a56146ff15ffce51d

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50221 Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ed0dc54cb946884ff2d5225a56146ff15ffce51d

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/48237/
            Subject: LU-7668 misc: add support for versions 2.17/2.18
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: 5313e6797fdf7e861fe01fc1885f966ca6d86ed7

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/48237/ Subject: LU-7668 misc: add support for versions 2.17/2.18 Project: doc/manual Branch: master Current Patch Set: Commit: 5313e6797fdf7e861fe01fc1885f966ca6d86ed7

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48237
            Subject: LU-7668 misc: add support for versions 2.17/2.18
            Project: doc/manual
            Branch: master
            Current Patch Set: 1
            Commit: 6632007c3a4b9ce912593133df6546fc128433ff

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48237 Subject: LU-7668 misc: add support for versions 2.17/2.18 Project: doc/manual Branch: master Current Patch Set: 1 Commit: 6632007c3a4b9ce912593133df6546fc128433ff

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/41454/
            Subject: LU-7668 admin: reference the del_ost command
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: 805e8a34afc5002bb1d843faeb067dfc4d70444b

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/41454/ Subject: LU-7668 admin: reference the del_ost command Project: doc/manual Branch: master Current Patch Set: Commit: 805e8a34afc5002bb1d843faeb067dfc4d70444b

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41449/
            Subject: LU-7668 utils: add lctl del_ost
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1121816c4a4e1bb2ef097c4a9802362181c43800

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41449/ Subject: LU-7668 utils: add lctl del_ost Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1121816c4a4e1bb2ef097c4a9802362181c43800

            It appears that the method describes in https://review.whamcloud.com/41453/ is not safe (for now) .

            lctl llog_cancel remove/cancel directly the record inside llog bitmap header. This creates index gaps in the llog config. llog_backup() is used to create a local copy of the MGS config when mounting a target. This function does not keep the index gaps in llog copy.
            So indexes mismatch between the copy and the original, it mess up the config update mechanism (e.g: adding a new target).

            For more information see LU-15000.

            eaujames Etienne Aujames added a comment - It appears that the method describes in https://review.whamcloud.com/41453/ is not safe (for now) . lctl llog_cancel remove/cancel directly the record inside llog bitmap header. This creates index gaps in the llog config. llog_backup() is used to create a local copy of the MGS config when mounting a target. This function does not keep the index gaps in llog copy. So indexes mismatch between the copy and the original, it mess up the config update mechanism (e.g: adding a new target). For more information see LU-15000 .

            Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/
            Subject: LU-7668 admin: remove OST from config logs
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: c7a2886198b7273165d90d5054e7da669ba97843

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/ Subject: LU-7668 admin: remove OST from config logs Project: doc/manual Branch: master Current Patch Set: Commit: c7a2886198b7273165d90d5054e7da669ba97843

            Hello Stephane,

            I think that the LU-14090 describes your issue.

            The patch https://review.whamcloud.com/40448 ("LU-14090 mgs: no local logs flag") add a tunefs.lustre option to ignore local log.

            eaujames Etienne Aujames added a comment - Hello Stephane, I think that the LU-14090 describes your issue. The patch https://review.whamcloud.com/40448 (" LU-14090 mgs: no local logs flag") add a tunefs.lustre option to ignore local log.

            I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as LU-9699. This cached version didn't even have the latest OSTs that we added to the filesystem. I'm wondering if there is a way to make sure the MDTs have up-to-date versions of their config, especially after adding or removing OSTs.

            sthiell Stephane Thiell added a comment - I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as LU-9699 . This cached version didn't even have the latest OSTs that we added to the filesystem. I'm wondering if there is a way to make sure the MDTs have up-to-date versions of their config, especially after adding or removing OSTs.

            People

              sthiell Stephane Thiell
              adilger Andreas Dilger
              Votes:
              2 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: