[LUDOC-305] "lctl deactivate/activate" does not work as expected in 19.1. Handling Full OSTs - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- easy

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In Lustre manual section 19.1. Handling Full OSTs, lctl deactivate and activate commands were documented to be used to take a full OST offline and return an inactive OST back online.

However, many bugs (~~LU-4825~~, ~~LU-4295~~, ~~LU-5931~~, DDN-172 and ~~LU-7012~~) showed that after deactivating the full OST, migrating data and reactivating the OST, the orphan objects on the OST were not destroyed.

As per the comments from Andreas in ~~LU-4825~~:

One problem here is that the documented procedure for migrating objects off of an OST is to use "lctl --device XXX deactivate" on the MDS for the OST(s), but this disconnects the MDS from the OST entirely and disables RPC sending at a low level in the code (RPC layer) so it isn't necessarily practical to special-case that code to allow only OST_DESTROY RPCs through from the MDS, since the MDS doesn't even know whether the OST is alive or dead at that point.

And he suggested:

One option that works on a variety of different Lustre versions is to mark an OST as degraded:
lctl set_param obdfilter.{OST_name}.degraded=1
This means that the MDS will skip the degraded OST(s) during most allocations, but will not skip them if someone requested a widely striped file and not enough non-degraded OSTs to fill the request.

I think we need to allow setting osp.*.max_create_count=0 to inform the MDS to skip object precreation on the OST(s), instead of using the old lctl --device * deactivate method, so that the MDS can still destroy OST objects for unlinked files. While it appears possible to set max_create_count=0 today, the MDS still tries to create objects on that OST if specified via lfs setstripe -i <idx> and it waits for a timeout (100s) trying to create files there before moving to the next OST (at <idx + 1>).

If max_create_count==0 then the LOD/OSP should skip this OSP immediately instead of waiting for a full timeout.

So, after patch http://review.whamcloud.com/16105 lands, could you please update Lustre manual accordingly?

Attachments

Issue Links

is related to

LU-4825 lfs migrate not freeing space on OST

Resolved

Activity

[LUDOC-305] "lctl deactivate/activate" does not work as expected in 19.1. Handling Full OSTs

Joseph Gmitter (Inactive) added a comment - 23/Jan/18 9:55 PM

Patch has landed.

Joseph Gmitter (Inactive) added a comment - 23/Jan/18 9:55 PM Patch has landed.

Gerrit Updater added a comment - 23/Jan/18 9:50 PM

Joseph Gmitter (joseph.gmitter@intel.com) merged in patch https://review.whamcloud.com/30864/
Subject: ~~LUDOC-305~~ maintenance: handling full/failed OSTs
Project: doc/manual
Branch: master
Current Patch Set:
Commit: feb018cdf25683c6ebbb0982f6b5c12040c0b9ec

Gerrit Updater added a comment - 23/Jan/18 9:50 PM Joseph Gmitter (joseph.gmitter@intel.com) merged in patch https://review.whamcloud.com/30864/ Subject: LUDOC-305 maintenance: handling full/failed OSTs Project: doc/manual Branch: master Current Patch Set: Commit: feb018cdf25683c6ebbb0982f6b5c12040c0b9ec

Gerrit Updater added a comment - 14/Jan/18 12:44 AM

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30864
Subject: ~~LUDOC-305~~ maintenance: handling full/failed OSTs
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: 0d0aa56251b435cbc81fb25f6ca62353e19e47b8

Gerrit Updater added a comment - 14/Jan/18 12:44 AM Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30864 Subject: LUDOC-305 maintenance: handling full/failed OSTs Project: doc/manual Branch: master Current Patch Set: 1 Commit: 0d0aa56251b435cbc81fb25f6ca62353e19e47b8

Andreas Dilger added a comment - 20/Mar/17 8:15 PM

Joe or Lai, could you please update the manual for this.

Andreas Dilger added a comment - 20/Mar/17 8:15 PM Joe or Lai, could you please update the manual for this.

Andreas Dilger added a comment - 07/Dec/16 9:31 PM

This was fixed for Lustre 2.9 and the manual should be updated to describe setting max_create_count=0 instead of deactivating the OST on the MDS.

Andreas Dilger added a comment - 07/Dec/16 9:31 PM This was fixed for Lustre 2.9 and the manual should be updated to describe setting max_create_count=0 instead of deactivating the OST on the MDS.

"lctl deactivate/activate" does not work as expected in 19.1. Handling Full OSTs

Details

Description

Attachments

Issue Links

Activity

People

Dates