[LU-14090] lctl replace_nids and starting target with local copy of logs Created: 29/Oct/20  Updated: 06/Apr/21  Resolved: 06/Apr/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Artem Blagodarenko (Inactive) Assignee: Artem Blagodarenko (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-10360 use Imperative Recovery logs for clie... Open
is related to LU-7668 permanently remove deactivated OSTs f... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a feature that starts a target with a local copy of
config log in order to avoid a delay in communicating with
an MGS and to load mgs log updates later on.
However, that feature is not always useful

When replace_nids adds records with new nids it does not
append remote config logs but overwrite corresponding
records in place. If a target starts using local config
log - it gets confused by outdated nids.

Let's add tunefs.lustre --nolocallogs key that
sets nolocallogs flag, which says ignore local configs copy.
The flag is reset once new logs are uploaded from MGS.

tunefs.lustre --nolocallogs is suggested to be executed on
targets together with replace_nids on MGS.



 Comments   
Comment by Gerrit Updater [ 29/Oct/20 ]

Artem Blagodarenko (artem.blagodarenko@hpe.com) uploaded a new patch: https://review.whamcloud.com/40448
Subject: LU-14090 mgs: no local logs flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 88b959844c018aba667b384087c3559c7925d8b0

Comment by Andreas Dilger [ 23/Mar/21 ]

The root of this problem is that the "replace_nids" code is breaking one of the assumptions of the config llog - that changes are added to the end, and old config records are not rewritten. That is why replace_nids needs all of the servers and clients to be unmounted when rewriting the logs, which is inconvenient. Something like LU-7668 "del_ost" is OK because it is only cancelling the old records, and the next time that the config llog is processed it will be as if the records never existed, which is better than adding OSTs and later removing them.

I think a better long-term solution for changing the server NIDs is to get patch https://review.whamcloud.com/40736 "LU-10360 mgs: Mount to dynamically added networks" updated and landed. That avoids the need to manually update the config logs if the server NIDs are changing, and allows clients to keep using the filesystem even if the server NIDs are changing at runtime (with a timeout and recovery if the NID change is sudden, but it could be transparent if both the old and new NID were valid for some time).

Comment by Gerrit Updater [ 06/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40448/
Subject: LU-14090 mgs: no local logs flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f38f09e02a05c82718344ad86f80a4a0f399af9d

Comment by Peter Jones [ 06/Apr/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:06:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.