Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
Lustre 2.15.2
-
3
-
9223372036854775807
Description
This is an extremely minor issue that I bring up only because it confused me for a bit.
The manual outlines the steps necessary to completely remove an ost from a filesystem. The manual explicitly says to delete any attach, setup, add_osc, add_pool, and other records. The example shows the add_uuid, attach, setup, and add_osc events being removed from the logs.
However, if you followed the entire section you likely also have a conf_param event for setting osc.active=0 for the ost(s). If, like me, you didn't include those events in the list of llog_cancels you ran you'd run into errors like:
LustreError: 4459:0:(obd_config.c:1526:class_process_config()) no device for: work-OST0000-osc-ffff93ebd7f3e000 LustreError: 4459:0:(obd_config.c:1998:class_config_llog_handler()) MGC172.16.200.250@o2ib: cfg command failed: rc = -22 Lustre: cmd=cf00f 0:work-OST0000-osc 1:osc.active=0 LustreError: 15b-f: MGC172.16.200.250@o2ib: Configuration from log work-client failed from MGS -22. Check client and MGS are on compatible version. Lustre: Unmounted work-client LustreError: 4447:0:(super25.c:182:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -22
when it came time to have a client mount the fs again.
It might be good to change the wording to be "...all records related to the removed OST(s)." or maybe include "conf_param" in the list of records that should also be removed, just to be as clear as possible. I don't disagree that this is already implied by the inclusion of "other records" in the existing documentation, just that a flu-addled admin (such as my current self) might assume that record should be retained since it wasn't explicitly called out.