[LU-15708] --replace does not update target NIDs in MGS configuration. Created: 30/Mar/22  Updated: 30/Mar/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When using mkfs.lustre --replace to replace an existing OST, the OST NIDs are not updated. If the OST ends up being located on a different OSS than the original one it was started on, the filesystem configuration is wrong, as it will still refer to the old NIDs, and not the new ones.

 

You end up having an unreachable and freshly reformated OST started, on an OSS with NID that does not match what is declared in MGS configuration for this OST.

 

Comment from mgs_write_log_target(), from Lustre v1.5

        if (rc == EALREADY) {
                LCONSOLE_WARN("Found index %d for %s, updating log\n",
                              mti->mti_stripe_index, mti->mti_svname);
                /* We would like to mark old log sections as invalid
                   and add new log sections in the client and mdt logs.
                   But if we add new sections, then live clients will
                   get repeat setup instructions for already running
                   osc's. So don't update the client/mdt logs. */
                mti->mti_flags &= ~LDD_F_UPDATE;
                rc = 0;
        }

I understand we don't have an existing mechanism for updating llogs live properly in that situation. Only off prod solutions exist (writeconf, replace_nids).

 

I'm opening this ticket as a reference for anybody struggling with mkfs.lustre --replace, and in case this is not the expected behavior. If it is, we should at least update the Lustre doc to clarify that point.


Generated at Sat Feb 10 03:20:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.