[LUDOC-520] Minor detail in section 14.9.3 (Removing an OST from the File System) Created: 17/Nov/23 Updated: 17/Nov/23 |
|
| Status: | Open |
| Project: | Lustre Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shane Nehring | Assignee: | Lustre Manual Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre 2.15.2 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This is an extremely minor issue that I bring up only because it confused me for a bit. The manual outlines the steps necessary to completely remove an ost from a filesystem. The manual explicitly says to delete any attach, setup, add_osc, add_pool, and other records. The example shows the add_uuid, attach, setup, and add_osc events being removed from the logs. However, if you followed the entire section you likely also have a conf_param event for setting osc.active=0 for the ost(s). If, like me, you didn't include those events in the list of llog_cancels you ran you'd run into errors like: LustreError: 4459:0:(obd_config.c:1526:class_process_config()) no device for: work-OST0000-osc-ffff93ebd7f3e000 LustreError: 4459:0:(obd_config.c:1998:class_config_llog_handler()) MGC172.16.200.250@o2ib: cfg command failed: rc = -22 Lustre: cmd=cf00f 0:work-OST0000-osc 1:osc.active=0 LustreError: 15b-f: MGC172.16.200.250@o2ib: Configuration from log work-client failed from MGS -22. Check client and MGS are on compatible version. Lustre: Unmounted work-client LustreError: 4447:0:(super25.c:182:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -22 when it came time to have a client mount the fs again. It might be good to change the wording to be "...all records related to the removed OST(s)." or maybe include "conf_param" in the list of records that should also be removed, just to be as clear as possible. I don't disagree that this is already implied by the inclusion of "other records" in the existing documentation, just that a flu-addled admin (such as my current self) might assume that record should be retained since it wasn't explicitly called out. |