Uploaded image for project: 'Lustre Documentation'
  1. Lustre Documentation
  2. LUDOC-305

"lctl deactivate/activate" does not work as expected in 19.1. Handling Full OSTs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      In Lustre manual section 19.1. Handling Full OSTs, lctl deactivate and activate commands were documented to be used to take a full OST offline and return an inactive OST back online.

      However, many bugs (LU-4825, LU-4295, LU-5931, DDN-172 and LU-7012) showed that after deactivating the full OST, migrating data and reactivating the OST, the orphan objects on the OST were not destroyed.

      As per the comments from Andreas in LU-4825:

      One problem here is that the documented procedure for migrating objects off of an OST is to use "lctl --device XXX deactivate" on the MDS for the OST(s), but this disconnects the MDS from the OST entirely and disables RPC sending at a low level in the code (RPC layer) so it isn't necessarily practical to special-case that code to allow only OST_DESTROY RPCs through from the MDS, since the MDS doesn't even know whether the OST is alive or dead at that point.

      And he suggested:

      One option that works on a variety of different Lustre versions is to mark an OST as degraded:

      lctl set_param obdfilter.{OST_name}.degraded=1
      

      This means that the MDS will skip the degraded OST(s) during most allocations, but will not skip them if someone requested a widely striped file and not enough non-degraded OSTs to fill the request.

      I think we need to allow setting osp.*.max_create_count=0 to inform the MDS to skip object precreation on the OST(s), instead of using the old lctl --device * deactivate method, so that the MDS can still destroy OST objects for unlinked files. While it appears possible to set max_create_count=0 today, the MDS still tries to create objects on that OST if specified via lfs setstripe -i <idx> and it waits for a timeout (100s) trying to create files there before moving to the next OST (at <idx + 1>).

      If max_create_count==0 then the LOD/OSP should skip this OSP immediately instead of waiting for a full timeout.

      So, after patch http://review.whamcloud.com/16105 lands, could you please update Lustre manual accordingly?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                LM-Triage Lustre Manual Triage
                Reporter:
                yujian Jian Yu
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: