[LU-386] Test failure on test suite replay-single Created: 02/Jun/11 Updated: 07/Jul/11 Resolved: 07/Jul/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4971 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/27ae3e58-8d47-11e0-aab9-52540025f9af. |
| Comments |
| Comment by Sarah Liu [ 02/Jun/11 ] |
|
this one can be reproduced |
| Comment by Peter Jones [ 08/Jun/11 ] |
|
Bobi I understand that you believe that this is related to bz 10821. Could you please elaborate as to why? Thanks Peter |
| Comment by Zhenyu Xu [ 08/Jun/11 ] |
|
due to misunderstanding i think, there someone associated bug 10821 with this error, which makes me want to check it out to see what history this issue has. |
| Comment by Zhenyu Xu [ 09/Jun/11 ] |
|
Sarah, Would you mind patching this debug patch and reproducing the issue, and uploading /tmp/test_65a file which the debug patch would generate as well as MDS debug logs? Thanks. |
| Comment by Sarah Liu [ 09/Jun/11 ] |
|
sure, will do it tomorrow. |
| Comment by Zhenyu Xu [ 09/Jun/11 ] |
|
Is the debug.log MDS log? Because I can not find following messages in it (fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 50a sleeping for 6000ms as it is set in the test script do_facet $SINGLEMDS lctl set_param fail_val=$((${REQ_DELAY} * 1000)) |
| Comment by Sarah Liu [ 10/Jun/11 ] |
|
it is the debug log of MDS and OST, I can separate MDS and OST and test it again |
| Comment by Zhenyu Xu [ 12/Jun/11 ] |
|
better got -1 logs from MDS site if possible (the test_65a file from client site as well) |
| Comment by Sarah Liu [ 13/Jun/11 ] |
|
the test_65a is already in the attached. |
| Comment by Zhenyu Xu [ 13/Jun/11 ] |
|
in test_65a-2 i found client got a early reply 00000100:00001000:3.0:1308021631.187436:0:9231:0:(events.c:140:reply_in_callback()) @@@ Early reply received: mlen=192 offset=0 replen=216 replied=0 unlinked=0 req@ffff81032c4a9400 x1371560011104309/t0(0) o-1->lustre-MDT0000_UUID@192.168.4.18@o2ib:30/10 lens 232/216 e 0 to 0 dl 1308021637 ref 2 fl Rpc:/ffffffff/ffffffff rc 0/-1 but don't know why client didn't call ptlrpc_at_recv_early_reply() last resort, reproduce the issue and gather -1 logs from client and mds (thus test_65a file also will contain -1 logs from client), you can zip and upload them. |
| Comment by Zhenyu Xu [ 15/Jun/11 ] |
|
from last comment (also reveals in the client log updated in logs.tar.gz), client consumes the early reply on 30/10 protal pair, which are SEQ_METADATA_PORTAL/MDC_REPLY_PORTAL, my local test shows different portal usage (12/10 MDS_REQUEST_PORTAL/MDC_REPLY_PORTAL), I don't know fid seq mechanism much and wondering whether the seq allocation handling relates to the issue. |
| Comment by Zhenyu Xu [ 16/Jun/11 ] |
|
would you mind trying this patch? |
| Comment by Sarah Liu [ 16/Jun/11 ] |
|
waiting for build: http://review.whamcloud.com/#change,957 |
| Comment by Sarah Liu [ 20/Jun/11 ] |
|
here is the maloo result with your patch other logs please find in the attached. |
| Comment by Zhenyu Xu [ 20/Jun/11 ] |
|
still need client test_65 log |
| Comment by Sarah Liu [ 21/Jun/11 ] |
|
I've uploaded 65a.tar.gz, please check it. |
| Comment by Zhenyu Xu [ 24/Jun/11 ] |
|
Sarah, I can not reserve any node, so still need your help testing with latest patch set in http://review.whamcloud.com/#change,957 TIA |
| Comment by Sarah Liu [ 24/Jun/11 ] |
|
yes, it is reproducible for just running test_65a and I will try this patch tomorrow and give you feedback |
| Comment by Sarah Liu [ 24/Jun/11 ] |
|
here is the log |
| Comment by Zhenyu Xu [ 28/Jun/11 ] |
|
patch tracking at http://review.whamcloud.com/1025 |
| Comment by Sarah Liu [ 30/Jun/11 ] |
|
this patch works. https://maloo.whamcloud.com/test_sets/9232fa84-a2dc-11e0-aee5-52540025f9af |
| Comment by James A Simmons [ 06/Jul/11 ] |
|
|
| Comment by Zhenyu Xu [ 06/Jul/11 ] |
|
update patch per James suggestion. |
| Comment by Sarah Liu [ 06/Jul/11 ] |
|
verified. https://maloo.whamcloud.com/test_sets/341a370c-a84d-11e0-bd2a-52540025f9af |
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Build Master (Inactive) [ 07/Jul/11 ] |
|
Integrated in Oleg Drokin : e0ee0aacd358893dad5c9f0da0dc19ba3ddf08a0
|
| Comment by Zhenyu Xu [ 07/Jul/11 ] |
|
landed on master for 2.1.0 |