[LU-8698] DNE3: check unknown hash type properly in lmv Created: 12/Oct/16  Updated: 01/Apr/22  Resolved: 01/Apr/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: WC Triage
Resolution: Done Votes: 0
Labels: dne3

Issue Links:
Related
is related to LU-11025 DNE3: directory restripe Resolved
is related to LU-4921 DNE clients should fall back to "try ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The patch 893ab7479263: "staging: lustre: lmv: try all stripes for
unknown hash functions" from Aug 16, 2016, leads to the following
static checker warning:

	drivers/staging/lustre/lustre/lmv/lmv_intent.c:397 lmv_intent_lookup()
	error: 'tgt' dereferencing possible ERR_PTR()
drivers/staging/lustre/lustre/lmv/lmv_intent.c
  361                               struct ptlrpc_request **reqp,
   362                               ldlm_blocking_callback cb_blocking,
   363                               __u64 extra_lock_flags)
   364  {
   365          struct lmv_stripe_md *lsm = op_data->op_mea1;
   366          struct obd_device      *obd = exp->exp_obd;
   367          struct lmv_obd   *lmv = &obd->u.lmv;
   368          struct lmv_tgt_desc    *tgt = NULL;
   369          struct mdt_body *body;
   370          int                  rc = 0;
   371  
   372          /*
   373           * If it returns ERR_PTR(-EBADFD) then it is an unknown hash type
   374           * it will try all stripes to locate the object
   375           */

I have read this comment so I assume that this warning is a false
positive.

   376          tgt = lmv_locate_mds(lmv, op_data, &op_data->op_fid1);
   377          if (IS_ERR(tgt) && (PTR_ERR(tgt) != -EBADFD))
   378                  return PTR_ERR(tgt);
   379  
   380          /*
   381           * Both migrating dir and unknown hash dir need to try
   382           * all of sub-stripes
   383           */
   384          if (lsm && !lmv_is_known_hash_type(lsm->lsm_md_hash_type)) {

But really, it's pretty ugly to assume that -EBADFD implies that lsm is
non-NULL and lmv_is_known_hash_type() will return false.

Yes, it might be a bit improper to use -EBADFD here to indicate this
special scenario, or it can be

if (IS_ERR(tgt)) {
      if (PTR_ERR(tgt) != -EBADFD)  /* hmm, this might not needed, needs
further check */
           return PTR_ERR(tgt);

      if (lsm && !lmv_is_known_hash_type(lsm->lsm_md_hash_type)) {
           ŠŠ..
      } else {
           return PTR_ERR(tgt);
      }
}

   385                  struct lmv_oinfo *oinfo = &lsm->lsm_md_oinfo[0];
   386  
   387                  op_data->op_fid1 = oinfo->lmo_fid;
   388                  op_data->op_mds = oinfo->lmo_mds;
   389                  tgt = lmv_get_target(lmv, oinfo->lmo_mds, NULL);
   390                  if (IS_ERR(tgt))
   391                          return PTR_ERR(tgt);
   392          }
   393  
   394          if (!fid_is_sane(&op_data->op_fid2))
   395                  fid_zero(&op_data->op_fid2);
   396  
   397          CDEBUG(D_INODE, "LOOKUP_INTENT with fid1=" DFID ", fid2=" DFID ", name='%s' -> mds #%u lsm=%p lsm_magic=%x\n",
   398                 PFID(&op_data->op_fid1), PFID(&op_data->op_fid2),
   399                 op_data->op_name ? op_data->op_name : "<NULL>",
   400                 tgt->ltd_idx, lsm, !lsm ? -1 : lsm->lsm_md_magic);
   401  
   402          op_data->op_bias &= ~MDS_CROSS_REF;


 Comments   
Comment by Andreas Dilger [ 07/Apr/20 ]

Lai, I think this is already handled with the directory restriping code, maybe even in 2.13? If yes, please link to the ticket/patch that fixed this problem and close it.

Is it possible to migrate a directory with an unknown hash type to a new type? This should work for any hash type, and just result in all directory stripes being walked and all entries being rehashed to new stripes. I think this is done with patch https://review.whamcloud.com/36898 "LU-11025 dne: support directory restripe", but it would be good to confirm (maybe add a fail_loc that allows setting an arbitrary lmv_hash_type=fail_val for testing).

This will allow recovering from a corrupted LMV EA, or if someone does upgrade, create directory with new hash, then downgrade, or if someone has a custom hash function that is not landed to mainline Lustre.

Generated at Sat Feb 10 02:19:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.