[LU-4945]  req_capsule_get: Wrong buffer for field `name' (5 of 6) in format `LDLM_INTENT_GETATTR': 3 vs. 0 (client) Created: 23/Apr/14  Updated: 01/May/14  Resolved: 01/May/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4603 NFS reexport leads to problems of "ls" Resolved
is related to LU-3531 DNE2: striped directory Resolved
Epic/Theme: dne
Severity: 3
Rank (Obsolete): 13685

 Description   

I found this problem when I tried to run racer with MDSCOUNT=4

LustreError: 8391:0:(pack_generic.c:815:lustre_msg_string()) can't unpack short string in msg ffffc90017980658 buffer[5] len 3: strlen 0
LustreError: 8391:0:(layout.c:2060:__req_capsule_get()) @@@ Wrong buffer for field `name' (5 of 6) in format `LDLM_INTENT_GETATTR': 3 vs. 0 (client)
req@ffff8802055e9800 x1466142079352512/t0(0) o101->203a5d79-0163-0009-b60a-805a18baffe6@0@lo:0/0 lens 576/3384 e 0 to 0 dl 1398222210 ref 1 fl Interpret:/0/0 rc 0/0



 Comments   
Comment by Andreas Dilger [ 24/Apr/14 ]

Di, how serious is this bug, and what problem would be visible to the client?

Is the source of this bug obvious, and could Lai create the patch?

Comment by Di Wang [ 24/Apr/14 ]

Andreas: this bug is pretty serious to me, which seems related with the new readdir change. I am investigating it right now, no obvious clue yet. Thanks.

Comment by Di Wang [ 24/Apr/14 ]

Hmm, it turns out using the dir_ent pointer to locate next entry is not very safe without holding ldlm lock.

mdc_read_entry()
{
....
       /* If op_data->op_ent != NULL(see ll_dir_entry_next), try to get
         * next ent directly */
        if (likely(op_data->op_ent != NULL)) {
                ent = lu_dirent_next(op_data->op_ent);
                if (likely(ent != NULL))
                        GOTO(out, rc);
        } else {

.....

So we either find a new way to resolve the hash conflict or hold the ldlm lock during iteration. I will cook a patch.

Comment by John Hammond [ 25/Apr/14 ]

This is a good excuse/opportunity to kill all the uses of LOGL0() to pack names:

+static void mdc_pack_name(struct ptlrpc_request *req,
+		   const struct req_msg_field *field,
+		   const char *name, size_t name_len)
+{
+	char *buf;
+	size_t buf_size;
+
+	buf = req_capsule_client_get(&req->rq_pill, field);
+	buf_size = req_capsule_get_size(&req->rq_pill, field, RCL_CLIENT);
+
+	LASSERT(buf != NULL &&
+		buf_size == name_len + 1 &&
+		name != NULL &&
+		name_len != 0 &&
+		strnlen(name, name_len) == name_len &&
+		name[name_len] == '\0');
+
+	strlcpy(buf, name, buf_size);
+
+       LASSERT(strlen(buf) == name_len);
+}
+
...
-	tmp = req_capsule_client_get(&req->rq_pill, &RMF_NAME);
-	LOGL0(op_data->op_name, op_data->op_namelen, tmp);
+	mdc_pack_name(req, &RMF_NAME, op_data->op_name, op_data->op_namelen);
Comment by Di Wang [ 25/Apr/14 ]

http://review.whamcloud.com/#/c/10109/

Comment by Di Wang [ 25/Apr/14 ]

Sorry, John, I did not include your changes into this patch. I will try to add it later.

Comment by Jodi Levi (Inactive) [ 30/Apr/14 ]

Changes merged into http://review.whamcloud.com/#/c/9191/

Comment by Andreas Dilger [ 30/Apr/14 ]

Fix was merged into http://review.whamcloud.com/9191 under LU-4603.

Comment by Di Wang [ 01/May/14 ]

The patch is already merged to the fix of LU-4603. I will close this for now.

Comment by Di Wang [ 01/May/14 ]

John: Sorry, Could you please create a new ticket for your suggestion? Thanks.

Comment by John Hammond [ 01/May/14 ]

Done.

Generated at Sat Feb 10 01:47:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.