[LU-7668] permanently remove deactivated OSTs from configuration log Created: 14/Jan/16  Updated: 19/Nov/23  Resolved: 12/Aug/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Stephane Thiell
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Blocker
is blocked by LU-15000 MDS crashes with (osp_dev.c:1404:osp_... Resolved
Related
is related to LU-4295 removing files on deactivated OST doe... Resolved
is related to LU-4397 Permanently disabled OST causes clien... Resolved
is related to LU-6601 deactivated OSTs do not appear to be ... Resolved
is related to LU-16024 Allow permanently removing an MDT fro... Open
is related to LU-6818 quiet permanently deactivated OSTs in... Resolved
is related to LU-8920 don't print permanently deactivated O... Resolved
is related to LU-7731 lctl dl - command don't report correc... Resolved
is related to LU-14090 lctl replace_nids and starting target... Resolved
is related to LU-14403 lctl dl UP and lfs df problem with co... Resolved
is related to LUDOC-352 document lctl llog_print Resolved
is related to LU-17299 DNE3: disable new regular file creati... Open
is related to LU-16475 Reusing OST indexes after lctl del_ost Open
is related to LU-12998 DNE3: tunable to disable directory cr... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

When an OST is permanently removed from the filesystem, the current process is to store a conf_param that marks the OSC permanently inactive in the configuration log, but it doesn't remove the actual OSC records from the config llog. That is needed for clients already mounting the filesystem so that they don't wait for the OST to be recovered, since only new records added to the end of config llog are processed by already-mounted clients.

However, for clients newly mounting the filesystem it may be desirable, if the OST is permanently deleted, to remove the OST record from the configuration log completely so that new client mounts don't even try to connect to it. This can be done relatively easily by cancelling the llog record(s) for the removed OST(s) so that the client doesn't process them at all.

The other area that may need fixing is lfs df, since it currently iterates over OST and MDT indices sequentially until it gets a -ENODEV return code that indicates no more OST/MDT devices are available. This will result in temporarily inactive OSTs to be printed, since the admin should know when there are offline OSTs, but it should not result in unconfigured OSTs being printed. It looks like -EAGAIN being returned from IOC_OBD_STATFS will result in the OST/MDT being silently skipped as we would want in this case. This needs to be verified.

It looks like lov_iocontrol() is returning -EAGAIN correctly, but lmv_iocontrol() is incorrectly returning -ENODATA for tgt == NULL instead of -EAGAIN, and not handling the OBD_STATFS_NODELAY flag in uarg at all.



 Comments   
Comment by Nathan Dauchy (Inactive) [ 30/Nov/17 ]

We are currently draining multiple OSTs from one of our file systems (to use as spare hardware) and are interested in this work. We have used both methods of "lctl --device XX deactivate" and "lctl conf_param lfs-OST-00XX.osc.active=0", and are still seeing the targets show up on client remount.

Has anything changed in the last ~2 years such that this task is still relevant, or (hopefully) easier to implement?

What workarounds are available while this task is waiting in the wings? Are there tricks to mount as ldiskfs and tweak the config files directly? Can we use a "writeconf" procedure to at least in part clean up the OST entry?

Comment by Andreas Dilger [ 01/Dec/17 ]

I landed a patch for “lfs df” in 2.10 (LU-8920) to skip printing of deactivated OSTs. This is where most users see inactive targets. It does not cancel the configuration records permanently.

Comment by Andreas Dilger [ 30/Sep/20 ]

To permanently remove one or more OSTs from the configuration logs, there are two possible approaches.

  • rewrite the configuration with "tunefs.lustre --writeconf" on all the filesystem targets, and then the regenerated config llog will not even list the OSTs that are not connected. This has the benefit of being relatively straight forward to do, but needs a filesystem outage to unmount, run --writeconf, and remount all of the remaining the targets.
  • use "lctl --device MGS llog_print $fsname-client" (and also "... $fsname-MDTxxxx" for all the MDTs) to list all attach, setup, add_osc, add_pool, and other records related to the removed OST(s), as well as potentially add_uuid records for the removed OSS nodes (if any):
    mgs# lctl --device MGS llog_print testfs-client | egrep "192.168.10.99@tcp|OST0003"
    - { index: 135, event: add_uuid, nid: 192.168.10.99@tcp(0x20000c0a80a63), node: 192.168.10.99@tcp }
    - { index: 136, event: attach, device: testfs-OST0003-osc, type: osc, UUID: testfs-clilov_UUID }
    - { index: 137, event: setup, device: testfs-OST0003-osc, UUID: testfs-OST0003_UUID, node: 192.168.10.99@tcp }
    - { index: 138, event: add_osc, device: testfs-clilov, ost: testfs-OST0003_UUID, index: 3, gen: 1 }
    
  • use "lctl --device MGS llog_cancel $fsname-client --i <index>" for each of those OST records in each of the config llogs to disable the processing, e.g. for each of the $fsname-client and $fsname-MDTxxxx config logs the appropriate record indices need to be cancelled (they will be different between the client and MDTs):
    mgs# lctl --device MGS llog_cancel testfs-client -i 138
    mgs# lctl --device MGS llog_cancel testfs-client -i 137
    mgs# lctl --device MGS llog_cancel testfs-client -i 136
    

The second option is a bit more effort to do today, but could be done while the filesystem is mounted as it only affects records that the mounted clients have already processed. The next time that a client or MDT are mounted, they should skip those OST records completely, and not try to connect to them at all.

It would definitely be useful to have a wrapper script/command (something like "lctl [--device MGS] del_ost <$fsname-OSTxxxx>") to do the llog_cancel commands with a minimum of user interaction. For safety, it would be best to cancel the config records from the higher to lower index, so that there are no errors if a client mounts in the middle of the changes.

Comment by Gerrit Updater [ 09/Feb/21 ]

Stephane Thiell (sthiell@stanford.edu) uploaded a new patch: https://review.whamcloud.com/41449
Subject: LU-7668 utils: add lctl del_ost
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 85f353cefe428eb0c9c0e183f3475853a6324523

Comment by Gerrit Updater [ 09/Feb/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41454
Subject: LU-7668 admin: reference the del_ost command
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: eae54ec42263b49de8358706d76cdcfd3d05cf8f

Comment by Andreas Dilger [ 09/Feb/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41453
Subject: LU-7668 admin: remove OST from config logs
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: 4e66495d8bc07dc7714e3f300c06b9390d3271c0

Comment by Cedric Castagnede [ 26/Feb/21 ]

Hello Andreas,

Instead of writing the Lustre Configuration Logs using "tunefs.lustre --writeconf", we could proceed using llog_cancel to completely remove OST from the configuration and get the same results for our case.

Am I understanding properly here ?

 

Comment by Andreas Dilger [ 28/Feb/21 ]

Essentially, yes. The llog_cancel command doesn't actually remove the records from the config log, but they are marked as "already processed" so they are skipped by the client and MDS during mount.

Comment by Etienne Aujames [ 05/Mar/21 ]

Hello Andreas,

What are the consequences to keep the "add_uuid" records when removing all the OSTs of a node?

Should we add an option to lctl del_ost" to remove "add_uuid" records if all the OSTs of a node have been deleted?

Comment by Andreas Dilger [ 07/Mar/21 ]

Etienne, I haven't looked into the details, but it would leave an unused network connection to that OSS, which may cause spurious connection attempts, though it may be the connection is never used if there are no targets on it. Definitely worthwhile to test out at least.

Comment by Stephane Thiell [ 22/Mar/21 ]

I'm still working on this, but we hit an unusual problem with llog_cancel / del_ost prototype. It worked fine at first, but after a few days, we restarted a MDS while the MGS was down, and two MDTs used a cached version of their config to start, trying to connect to the old OSTs again and we hit a MDS crash same as LU-9699. This cached version didn't even have the latest OSTs that we added to the filesystem. I'm wondering if there is a way to make sure the MDTs have up-to-date versions of their config, especially after adding or removing OSTs.

Comment by Etienne Aujames [ 23/Mar/21 ]

Hello Stephane,

I think that the LU-14090 describes your issue.

The patch https://review.whamcloud.com/40448 ("LU-14090 mgs: no local logs flag") add a tunefs.lustre option to ignore local log.

Comment by Gerrit Updater [ 17/Apr/21 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/41453/
Subject: LU-7668 admin: remove OST from config logs
Project: doc/manual
Branch: master
Current Patch Set:
Commit: c7a2886198b7273165d90d5054e7da669ba97843

Comment by Etienne Aujames [ 18/Feb/22 ]

It appears that the method describes in https://review.whamcloud.com/41453/ is not safe (for now) .

lctl llog_cancel remove/cancel directly the record inside llog bitmap header. This creates index gaps in the llog config. llog_backup() is used to create a local copy of the MGS config when mounting a target. This function does not keep the index gaps in llog copy.
So indexes mismatch between the copy and the original, it mess up the config update mechanism (e.g: adding a new target).

For more information see LU-15000.

Comment by Gerrit Updater [ 11/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41449/
Subject: LU-7668 utils: add lctl del_ost
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1121816c4a4e1bb2ef097c4a9802362181c43800

Comment by Gerrit Updater [ 12/Aug/22 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/41454/
Subject: LU-7668 admin: reference the del_ost command
Project: doc/manual
Branch: master
Current Patch Set:
Commit: 805e8a34afc5002bb1d843faeb067dfc4d70444b

Comment by Gerrit Updater [ 17/Aug/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48237
Subject: LU-7668 misc: add support for versions 2.17/2.18
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: 6632007c3a4b9ce912593133df6546fc128433ff

Comment by Gerrit Updater [ 17/Aug/22 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/48237/
Subject: LU-7668 misc: add support for versions 2.17/2.18
Project: doc/manual
Branch: master
Current Patch Set:
Commit: 5313e6797fdf7e861fe01fc1885f966ca6d86ed7

Comment by Gerrit Updater [ 07/Mar/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50221
Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ed0dc54cb946884ff2d5225a56146ff15ffce51d

Comment by Gerrit Updater [ 21/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50221/
Subject: LU-7668 tests: skip conf-sanity test_33a for old MGS
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ba5346b050fb395844252a706d4dba2ef0e0d8dc

Generated at Sat Feb 10 02:10:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.