Affects Version/s: None
Fix Version/s: None
In Lustre manual section 19.1. Handling Full OSTs, lctl deactivate and activate commands were documented to be used to take a full OST offline and return an inactive OST back online.
However, many bugs (
LU-4825, LU-4295, LU-5931, DDN-172 and LU-7012) showed that after deactivating the full OST, migrating data and reactivating the OST, the orphan objects on the OST were not destroyed.
As per the comments from Andreas in
One problem here is that the documented procedure for migrating objects off of an OST is to use "lctl --device XXX deactivate" on the MDS for the OST(s), but this disconnects the MDS from the OST entirely and disables RPC sending at a low level in the code (RPC layer) so it isn't necessarily practical to special-case that code to allow only OST_DESTROY RPCs through from the MDS, since the MDS doesn't even know whether the OST is alive or dead at that point.
And he suggested:
One option that works on a variety of different Lustre versions is to mark an OST as degraded:
This means that the MDS will skip the degraded OST(s) during most allocations, but will not skip them if someone requested a widely striped file and not enough non-degraded OSTs to fill the request.
I think we need to allow setting osp.*.max_create_count=0 to inform the MDS to skip object precreation on the OST(s), instead of using the old lctl --device * deactivate method, so that the MDS can still destroy OST objects for unlinked files. While it appears possible to set max_create_count=0 today, the MDS still tries to create objects on that OST if specified via lfs setstripe -i <idx> and it waits for a timeout (100s) trying to create files there before moving to the next OST (at <idx + 1>).
If max_create_count==0 then the LOD/OSP should skip this OSP immediately instead of waiting for a full timeout.
So, after patch http://review.whamcloud.com/16105 lands, could you please update Lustre manual accordingly?