[LU-15708] --replace does not update target NIDs in MGS configuration. Created: 30/Mar/22 Updated: 30/Mar/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Aurelien Degremont (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
When using mkfs.lustre --replace to replace an existing OST, the OST NIDs are not updated. If the OST ends up being located on a different OSS than the original one it was started on, the filesystem configuration is wrong, as it will still refer to the old NIDs, and not the new ones.
You end up having an unreachable and freshly reformated OST started, on an OSS with NID that does not match what is declared in MGS configuration for this OST.
Comment from mgs_write_log_target(), from Lustre v1.5
if (rc == EALREADY) {
LCONSOLE_WARN("Found index %d for %s, updating log\n",
mti->mti_stripe_index, mti->mti_svname);
/* We would like to mark old log sections as invalid
and add new log sections in the client and mdt logs.
But if we add new sections, then live clients will
get repeat setup instructions for already running
osc's. So don't update the client/mdt logs. */
mti->mti_flags &= ~LDD_F_UPDATE;
rc = 0;
}
I understand we don't have an existing mechanism for updating llogs live properly in that situation. Only off prod solutions exist (writeconf, replace_nids).
I'm opening this ticket as a reference for anybody struggling with mkfs.lustre --replace, and in case this is not the expected behavior. If it is, we should at least update the Lustre doc to clarify that point. |