Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4945

req_capsule_get: Wrong buffer for field `name' (5 of 6) in format `LDLM_INTENT_GETATTR': 3 vs. 0 (client)

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.6.0
    • Lustre 2.6.0
    • None
    • 3
    • 13685

    Description

      I found this problem when I tried to run racer with MDSCOUNT=4

      LustreError: 8391:0:(pack_generic.c:815:lustre_msg_string()) can't unpack short string in msg ffffc90017980658 buffer[5] len 3: strlen 0
      LustreError: 8391:0:(layout.c:2060:__req_capsule_get()) @@@ Wrong buffer for field `name' (5 of 6) in format `LDLM_INTENT_GETATTR': 3 vs. 0 (client)
      req@ffff8802055e9800 x1466142079352512/t0(0) o101->203a5d79-0163-0009-b60a-805a18baffe6@0@lo:0/0 lens 576/3384 e 0 to 0 dl 1398222210 ref 1 fl Interpret:/0/0 rc 0/0

      Attachments

        Issue Links

          Activity

            [LU-4945] req_capsule_get: Wrong buffer for field `name' (5 of 6) in format `LDLM_INTENT_GETATTR': 3 vs. 0 (client)

            The patch is already merged to the fix of LU-4603. I will close this for now.

            di.wang Di Wang (Inactive) added a comment - The patch is already merged to the fix of LU-4603 . I will close this for now.

            Fix was merged into http://review.whamcloud.com/9191 under LU-4603.

            adilger Andreas Dilger added a comment - Fix was merged into http://review.whamcloud.com/9191 under LU-4603 .
            jlevi Jodi Levi (Inactive) added a comment - Changes merged into http://review.whamcloud.com/#/c/9191/

            Sorry, John, I did not include your changes into this patch. I will try to add it later.

            di.wang Di Wang (Inactive) added a comment - Sorry, John, I did not include your changes into this patch. I will try to add it later.
            di.wang Di Wang (Inactive) added a comment - http://review.whamcloud.com/#/c/10109/

            This is a good excuse/opportunity to kill all the uses of LOGL0() to pack names:

            +static void mdc_pack_name(struct ptlrpc_request *req,
            +		   const struct req_msg_field *field,
            +		   const char *name, size_t name_len)
            +{
            +	char *buf;
            +	size_t buf_size;
            +
            +	buf = req_capsule_client_get(&req->rq_pill, field);
            +	buf_size = req_capsule_get_size(&req->rq_pill, field, RCL_CLIENT);
            +
            +	LASSERT(buf != NULL &&
            +		buf_size == name_len + 1 &&
            +		name != NULL &&
            +		name_len != 0 &&
            +		strnlen(name, name_len) == name_len &&
            +		name[name_len] == '\0');
            +
            +	strlcpy(buf, name, buf_size);
            +
            +       LASSERT(strlen(buf) == name_len);
            +}
            +
            ...
            -	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
            -	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
            +	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
            
            jhammond John Hammond added a comment - This is a good excuse/opportunity to kill all the uses of LOGL0() to pack names: +static void mdc_pack_name(struct ptlrpc_request *req, + const struct req_msg_field *field, + const char *name, size_t name_len) +{ + char *buf; + size_t buf_size; + + buf = req_capsule_client_get(&req->rq_pill, field); + buf_size = req_capsule_get_size(&req->rq_pill, field, RCL_CLIENT); + + LASSERT(buf != NULL && + buf_size == name_len + 1 && + name != NULL && + name_len != 0 && + strnlen(name, name_len) == name_len && + name[name_len] == '\0'); + + strlcpy(buf, name, buf_size); + + LASSERT(strlen(buf) == name_len); +} + ... - tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME); - LOGL0(op_data->op_name, op_data->op_namelen, tmp); + mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);

            Hmm, it turns out using the dir_ent pointer to locate next entry is not very safe without holding ldlm lock.

            mdc_read_entry()
            {
            ....
                   /* If op_data->op_ent != NULL(see ll_dir_entry_next), try to get
                     * next ent directly */
                    if (likely(op_data->op_ent != NULL)) {
                            ent = lu_dirent_next(op_data->op_ent);
                            if (likely(ent != NULL))
                                    GOTO(out, rc);
                    } else {
            
            .....
            

            So we either find a new way to resolve the hash conflict or hold the ldlm lock during iteration. I will cook a patch.

            di.wang Di Wang (Inactive) added a comment - Hmm, it turns out using the dir_ent pointer to locate next entry is not very safe without holding ldlm lock. mdc_read_entry() { .... /* If op_data->op_ent != NULL(see ll_dir_entry_next), try to get * next ent directly */ if (likely(op_data->op_ent != NULL)) { ent = lu_dirent_next(op_data->op_ent); if (likely(ent != NULL)) GOTO(out, rc); } else { ..... So we either find a new way to resolve the hash conflict or hold the ldlm lock during iteration. I will cook a patch.

            Andreas: this bug is pretty serious to me, which seems related with the new readdir change. I am investigating it right now, no obvious clue yet. Thanks.

            di.wang Di Wang (Inactive) added a comment - Andreas: this bug is pretty serious to me, which seems related with the new readdir change. I am investigating it right now, no obvious clue yet. Thanks.

            Di, how serious is this bug, and what problem would be visible to the client?

            Is the source of this bug obvious, and could Lai create the patch?

            adilger Andreas Dilger added a comment - Di, how serious is this bug, and what problem would be visible to the client? Is the source of this bug obvious, and could Lai create the patch?

            People

              di.wang Di Wang (Inactive)
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: