[LU-7668] permanently remove deactivated OSTs from configuration log Created: 14/Jan/16 Updated: 19/Nov/23 Resolved: 12/Aug/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Stephane Thiell |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
When an OST is permanently removed from the filesystem, the current process is to store a conf_param that marks the OSC permanently inactive in the configuration log, but it doesn't remove the actual OSC records from the config llog. That is needed for clients already mounting the filesystem so that they don't wait for the OST to be recovered, since only new records added to the end of config llog are processed by already-mounted clients. However, for clients newly mounting the filesystem it may be desirable, if the OST is permanently deleted, to remove the OST record from the configuration log completely so that new client mounts don't even try to connect to it. This can be done relatively easily by cancelling the llog record(s) for the removed OST(s) so that the client doesn't process them at all. The other area that may need fixing is lfs df, since it currently iterates over OST and MDT indices sequentially until it gets a -ENODEV return code that indicates no more OST/MDT devices are available. This will result in temporarily inactive OSTs to be printed, since the admin should know when there are offline OSTs, but it should not result in unconfigured OSTs being printed. It looks like -EAGAIN being returned from IOC_OBD_STATFS will result in the OST/MDT being silently skipped as we would want in this case. This needs to be verified. It looks like lov_iocontrol() is returning -EAGAIN correctly, but lmv_iocontrol() is incorrectly returning -ENODATA for tgt == NULL instead of -EAGAIN, and not handling the OBD_STATFS_NODELAY flag in uarg at all. |
| Comments |
| Comment by Nathan Dauchy (Inactive) [ 30/Nov/17 ] |
|
We are currently draining multiple OSTs from one of our file systems (to use as spare hardware) and are interested in this work. We have used both methods of "lctl --device XX deactivate" and "lctl conf_param lfs-OST-00XX.osc.active=0", and are still seeing the targets show up on client remount. Has anything changed in the last ~2 years such that this task is still relevant, or (hopefully) easier to implement? What workarounds are available while this task is waiting in the wings? Are there tricks to mount as ldiskfs and tweak the config files directly? Can we use a "writeconf" procedure to at least in part clean up the OST entry? |
| Comment by Andreas Dilger [ 01/Dec/17 ] |
|
I landed a patch for “lfs df” in 2.10 ( |
| Comment by Andreas Dilger [ 30/Sep/20 ] |
|
To permanently remove one or more OSTs from the configuration logs, there are two possible approaches.
The second option is a bit more effort to do today, but could be done while the filesystem is mounted as it only affects records that the mounted clients have already processed. The next time that a client or MDT are mounted, they should skip those OST records completely, and not try to connect to them at all. It would definitely be useful to have a wrapper script/command (something like "lctl [--device MGS] del_ost <$fsname-OSTxxxx>") to do the llog_cancel commands with a minimum of user interaction. For safety, it would be best to cancel the config records from the higher to lower index, so that there are no errors if a client mounts in the middle of the changes. |
| Comment by Gerrit Updater [ 09/Feb/21 ] |
|
Stephane Thiell (sthiell@stanford.edu) uploaded a new patch: https://review.whamcloud.com/41449 |
| Comment by Gerrit Updater [ 09/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41454 |
| Comment by Andreas Dilger [ 09/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41453 |
| Comment by Cedric Castagnede [ 26/Feb/21 ] |
|
Hello Andreas, Instead of writing the Lustre Configuration Logs using "tunefs.lustre --writeconf", we could proceed using llog_cancel to completely remove OST from the configuration and get the same results for our case. Am I understanding properly here ?
|
| Comment by Andreas Dilger [ 28/Feb/21 ] |
|
Essentially, yes. The llog_cancel command doesn't actually remove the records from the config log, but they are marked as "already processed" so they are skipped by the client and MDS during mount. |
| Comment by Etienne Aujames [ 05/Mar/21 ] |
|
Hello Andreas, What are the consequences to keep the "add_uuid" records when removing all the OSTs of a node? Should we add an option to lctl del_ost" to remove "add_uuid" records if all the OSTs of a node have been deleted? |
| Comment by Andreas Dilger [ 07/Mar/21 ] |
|
Etienne, I haven't looked into the details, but it would leave an unused network connection to that OSS, which may cause spurious connection attempts, though it may be the connection is never used if there are no targets on it. Definitely worthwhile to test out at least. |
| Comment by Stephane Thiell [ 22/Mar/21 ] |
|
I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as |
| Comment by Etienne Aujames [ 23/Mar/21 ] |
|
Hello Stephane, I think that the The patch https://review.whamcloud.com/40448 (" |
| Comment by Gerrit Updater [ 17/Apr/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/ |
| Comment by Etienne Aujames [ 18/Feb/22 ] |
|
It appears that the method describes in https://review.whamcloud.com/41453/ is not safe (for now) . lctl llog_cancel remove/cancel directly the record inside llog bitmap header. This creates index gaps in the llog config. llog_backup() is used to create a local copy of the MGS config when mounting a target. This function does not keep the index gaps in llog copy. For more information see |
| Comment by Gerrit Updater [ 11/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41449/ |
| Comment by Gerrit Updater [ 12/Aug/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/41454/ |
| Comment by Gerrit Updater [ 17/Aug/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48237 |
| Comment by Gerrit Updater [ 17/Aug/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/48237/ |
| Comment by Gerrit Updater [ 07/Mar/23 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50221 |
| Comment by Gerrit Updater [ 21/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50221/ |