[LU-2867] 2.1.4<->2.4.0 interop: parallel-scale test_compilebench: IOError: [Errno 71] Protocol error Created: 26/Feb/13 Updated: 07/Mar/13 Resolved: 07/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | nasf (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | HB | ||
| Environment: |
Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6933 | ||||||||
| Description |
|
The parallel-scale test compilebench failed as follows: IOError: [Errno 71] Protocol error parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Console log on the client node client-19vm1 showed that: 16:48:59:Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej 16:54:22:LustreError: 23937:0:(pack_generic.c:413:lustre_msg_buf_v2()) msg ffff880042c05cc0 buffer[2] size 0 too small (required 40, opc=101) 16:54:22:LustreError: 23937:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `dlm_lvb' (2 of 3) in format `LDLM_ENQUEUE_LVB': 0 vs. 40 (server) 16:54:22: req@ffff880042c05800 x1427708753202028/t0(0) o101->lustre-OST0001-osc-ffff880075eeb000@10.10.2.223@tcp:28/4 lens 296/312 e 0 to 0 dl 1361581007 ref 1 fl Interpret:R/0/0 rc 0/0 16:54:22:LustreError: 23937:0:(pack_generic.c:413:lustre_msg_buf_v2()) msg ffff880042c05cc0 buffer[2] size 0 too small (required 40, opc=101) 16:54:22:LustreError: 23937:0:(pack_generic.c:413:lustre_msg_buf_v2()) Skipped 1 previous similar message 16:54:22:LustreError: 23937:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `dlm_lvb' (2 of 3) in format `LDLM_ENQUEUE_LVB': 0 vs. 40 (server) 16:54:22: req@ffff880042c05800 x1427708753202030/t0(0) o101->lustre-OST0001-osc-ffff880075eeb000@10.10.2.223@tcp:28/4 lens 296/312 e 0 to 0 dl 1361581023 ref 1 fl Interpret:R/0/0 rc 0/0 16:54:22:LustreError: 23937:0:(layout.c:1659:__req_capsule_get()) Skipped 1 previous similar message 16:54:22:Lustre: DEBUG MARKER: /usr/sbin/lctl mark parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Maloo report: https://maloo.whamcloud.com/test_sets/2bef1a90-7d79-11e2-85d0-52540035b04c |
| Comments |
| Comment by Jian Yu [ 27/Feb/13 ] |
|
Hi Nasf, Is this a duplicate of |
| Comment by nasf (Inactive) [ 27/Feb/13 ] |
|
I do not think so. |
| Comment by Zhenyu Xu [ 28/Feb/13 ] |
|
I think it must has something to do with variable sized LVB support (http://review.whamcloud.com/3965) which on the server side mdt_intent_policy() set the DLM_LVB field of server buf with 0 size mdt_intent_policy()
} else {
/* No intent was provided */
LASSERT(pill->rc_fmt == &RQF_LDLM_ENQUEUE);
req_capsule_set_size(pill, &RMF_DLM_LVB, RCL_SERVER, 0);
rc = req_capsule_server_pack(pill);
|
| Comment by nasf (Inactive) [ 06/Mar/13 ] |
|
Bobijam, I do not think it is the code section you mentioned above caused the failure. Because the failed lock was a EXT lock, not an IBITS lock. In fact, the directly reason for the failure was the OST out of memory. The log on client: Searching the failed lock "0xde77b732306ea406" in OST log, we can find that: As you can see, "ldlm_resource_get()" failed for "-ENOMEM": ldlm_resource_get() ==> ofd_lvbo_init() ==> OBD_ALLOC_PTR(lvb). That means the "lvb" on the lock resource was NULL because of not enough memory at that moment. But the ldlm_handle_enqueue0() ignored such "lvb" error and went ahead, then later, it packed zero-sized "lvb" into the reply. So it was NOT variable-sized LVB issue, but some memory issue. |
| Comment by nasf (Inactive) [ 07/Mar/13 ] |
|
Cannot allocate "lvb" for ext lock |
| Comment by nasf (Inactive) [ 07/Mar/13 ] |
|
It is a duplication of |