[LU-1458] lustre-rsync-test test_2b: old lustre_rsync does not work with new llog_changelog_ext_rec remove changelog Created: 31/May/12 Updated: 18/Mar/14 Resolved: 18/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.1.2, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.5.1 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | yuc2 | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 4107 | ||||||||||||||||||||
| Description |
|
This issue was created by maloo for yujian <yujian@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/eb3d7ed4-ab13-11e1-8e7f-52540035b04c. The sub-test test_2b failed with the following error:
Info required for matching: lustre-rsync-test 2b |
| Comments |
| Comment by Peter Jones [ 31/May/12 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Zhenyu Xu [ 01/Jun/12 ] |
|
I've tried on my VM machine, cannot hit the issue. ps, The test process is: 1. run dbench on lustre direcotry. This auto test case shows that /mnt/lustre/d0.lustre-rsync-test/d2/clients/client1/~dmtmp/PM/PMD394.TMP differs from its lustre_rsync-ed destination directory /tmp/target. |
| Comment by Sarah Liu [ 05/Jun/12 ] |
|
I am not sure if this is the same issue:https://maloo.whamcloud.com/test_sets/81757bd0-ad72-11e1-8152-52540035b04c client: 2.1.1-rhel6 |
| Comment by Zhenyu Xu [ 05/Jun/12 ] |
|
Sarah, it's a different issue, and I've created a ticket for it ( |
| Comment by Li Wei (Inactive) [ 23/Jun/12 ] |
|
https://maloo.whamcloud.com/test_sets/70d74642-bc21-11e1-8a1f-52540035b04c |
| Comment by Zhenyu Xu [ 23/Jul/12 ] |
|
could possibly related. |
| Comment by Oleg Drokin [ 02/Aug/12 ] |
|
Bobi, so the problem is not in the file content difference. The problem is this file only exists in the source dir, so it was not copied to target dir at all. |
| Comment by Zhenyu Xu [ 06/Aug/12 ] |
|
I'll upload a debug patch to dump the changelog in plain text. When encounters another hit, would it possible to upload the changelog file as well? It contains all changelog records which lustre_rsync uses to replicate lustre source dir. |
| Comment by Zhenyu Xu [ 07/Aug/12 ] |
|
debug improvement patch tracking at http://review.whamcloud.com/3551 patch description LU-1458 test: dump changelog for lustre-rsync-test Dump plain text format changelog records for failed lustre-rsync-test test case to help debugging. |
| Comment by Peter Jones [ 08/Aug/12 ] |
|
Diagnostic patch landed to master so should be in the next tag. |
| Comment by Peter Jones [ 22/Aug/12 ] |
|
Sarah Have you been able to test whether this issue still occurs since the diagnostic patch was landed or have you been blocked in doing so by another issue? Peter |
| Comment by Sarah Liu [ 22/Aug/12 ] |
|
Hi Peter, due to TT-832, I cannot provision 2.1.1 client to verity this. |
| Comment by Sarah Liu [ 24/Aug/12 ] |
|
Hit the similar issue with client running 2.1.1, server running lustre-master-tag2.2.93. The debug patch is for master only while this error was seen in interop testing which actually use the script on client. I am trying to port the changes to 2.1.1 and rerun the test. unfortunately this is the report without debug patch |
| Comment by Sarah Liu [ 24/Aug/12 ] |
|
https://maloo.whamcloud.com/test_sets/0d1e405e-ee19-11e1-8649-52540035b04c |
| Comment by Zhenyu Xu [ 27/Aug/12 ] |
|
Sorry Sarah, please try http://review.whamcloud.com/3795 and reproduce it. patch description LU-1458 test: enable lustre_rsync debug log dump
Make lustre_rsync dump its debug log to help debugging.
|
| Comment by Sarah Liu [ 27/Aug/12 ] |
|
https://maloo.whamcloud.com/test_sets/fdfec3da-f08b-11e1-8816-52540035b04c |
| Comment by Zhenyu Xu [ 27/Aug/12 ] |
|
Sarah, The new patch changes lustre/utils/lustre_rsync.c, so we need deploy new images so that lustre_rsync can support this new -D option |
| Comment by Sarah Liu [ 28/Aug/12 ] |
|
Bobi, the new build failed: http://build.whamcloud.com/job/lustre-reviews/8680/ |
| Comment by Zhenyu Xu [ 29/Aug/12 ] |
|
done the rebuild |
| Comment by Sarah Liu [ 29/Aug/12 ] |
The patch is for master while this error occurs during interop testing between master and 2.1.x. I can manually port the script changes to 2.1.x but not the lustre_rsync.c. Could you please change that on 2.1.x so I can have a review build to test? |
| Comment by Zhenyu Xu [ 29/Aug/12 ] |
|
b2_1 patch tracking at http://review.whamcloud.com/3822 |
| Comment by Sarah Liu [ 30/Aug/12 ] |
|
https://maloo.whamcloud.com/test_sets/5f1d7e18-f2e4-11e1-b39f-52540035b04c |
| Comment by Sarah Liu [ 30/Aug/12 ] |
|
changelog |
| Comment by Zhenyu Xu [ 30/Aug/12 ] |
|
from test_1.changelog 8 08RNMFM 17:53:31.834990148 2012.08.30 0x0 t=[0:0x0:0x0] p=[0x200000400:0x4:0x0] 9 01CREAT 17:53:31.838991641 2012.08.30 0x0 t=[0x200000400:0x9:0x0] p=[0x200000400:0x3:0x0] file4 and from lrsync_log.client_1.log ***** Start 8 RNMFM (8) [0:0x0:0x0] [0x200000400:0x4:0x0] ***** move: /tmp/target/d0.lustre-rsync-test/d1/d2/ [to] /tmp/target/d0.lustre-rsync-test/d1/d1/file4 rc1=0, errno=95 move: /tmp/target2/d0.lustre-rsync-test/d1/d2/ [to] /tmp/target2/d0.lustre-rsync-test/d1/d1/file4 rc1=0, errno=95 ##### End 8 RNMFM (8) [0:0x0:0x0] [0x200000400:0x4:0x0] rc=0 ##### and the test_log error shows Only in /tmp/target/d0.lustre-rsync-test/d1/d1: file4 Only in /mnt/lustre/d0.lustre-rsync-test/d1: d2 the error must happen in lr_move(), lustre_rsync does not handle rename properly, still investigating. |
| Comment by Zhenyu Xu [ 31/Aug/12 ] |
|
I did a rename operation in master branch, the changelog shows 19 08RENME 06:30:12.444665393 2012.08.31 0x0 t=[0:0x0:0x0] p=[0x200000400:0xc:0x0] file4 s=[0x200000400:0xd:0x0] sp=[0x200000400:0xb:0x0] d2 Sarah, what's the server version your test? I think the client is b2_1. |
| Comment by Zhenyu Xu [ 31/Aug/12 ] |
|
I guess it's related to http://review.whamcloud.com/2577, old lustre_rsync does not work with newer MDS server with regard to rename operation. |
| Comment by Zhenyu Xu [ 31/Aug/12 ] |
|
b2_2 port of review#2577 tracking at http://review.whamcloud.com/3834 |
| Comment by Sarah Liu [ 31/Aug/12 ] |
|
server uses build 8694 from this review http://review.whamcloud.com/#change,3795 |
| Comment by Jian Yu [ 17/Sep/12 ] |
|
Lustre client build: http://build.whamcloud.com/job/lustre-b2_1/121 lustre-rsync-test failed: https://maloo.whamcloud.com/test_sets/7075cfac-008c-11e2-860a-52540035b04c |
| Comment by Zhenyu Xu [ 17/Sep/12 ] |
|
yujian, b2_1 patch http://review.whamcloud.com/#change,3835 hasn't landed yet. So lustre_rsync still does not work with b2_3 server. |
| Comment by Jian Yu [ 10/Oct/12 ] |
|
Lustre Tag: v2_3_0_RC2 This issue occurred again: https://maloo.whamcloud.com/test_sets/fa89fd64-12b4-11e2-a23c-52540035b04c Bobi, could you please check the above report? The failure occurred on a non-interop environment. |
| Comment by Zhenyu Xu [ 10/Oct/12 ] |
|
can you upload the $LOGDIR/${TESTSUITE}.test_2b.changelog? (it should be generated on checkdiff error) |
| Comment by Jian Yu [ 11/Oct/12 ] |
Attached. Please check. Thanks. |
| Comment by Zhenyu Xu [ 11/Oct/12 ] |
|
hmm. there are 1511 records in the lustre-rsync-test.test_2b.changelog, and the test log shows that lustre_rsync consumes 1510 records
and the 1511th in the lustre-rsync-test.test_2b.changelog is just the creation of the INV.PRN file
Might be some changelog read/write competetion here. |
| Comment by Zhenyu Xu [ 14/Oct/12 ] |
|
WangDi, is there a way to make sure all changelog recoreds are synced on its dt object? |
| Comment by Jian Yu [ 18/Dec/12 ] |
|
RHEL6.3/x86_64 (2.3.0 Server + 2.1.4 RC1 Client): |
| Comment by Keith Mannthey (Inactive) [ 15/Jun/13 ] |
|
Fresh Master error with logs: https://maloo.whamcloud.com/test_sets/04386912-d54d-11e2-bcd8-52540035b04c test_2b
Error: 'test failed to respond and timed out'
Failure Rate: 22.00% of last 100 executions [all branches]
There is plenty of this: Replication of operation failed(-17): 4123 CREAT (1) [0x200000bd0:0x767:0x0] [0x200000bd0:0x766:0x0] client.txt Replication of operation failed(-17): 4124 CREAT (1) [0x200000bd0:0x768:0x0] [0x200000bd0:0x766:0x0] dbench Replication of operation failed(-17): 4125 MKDIR (2) [0x200000bd0:0x769:0x0] [0x200000bd0:0x766:0x0] lib64 Replication of operation failed(-17): 4126 CREAT (1) [0x200000bd0:0x76a:0x0] [0x200000bd0:0x769:0x0] libpopt.so.0 Replication of operation failed(-17): 4129 CREAT (1) [0x200000bd0:0x76b:0x0] [0x200000bd0:0x769:0x0] libc.so.6 17 is EEXIST. It is not clear if this is the same exact issue but it fails with the same errors. |
| Comment by Keith Mannthey (Inactive) [ 17/Jun/13 ] |
|
Another one: https://maloo.whamcloud.com/sub_tests/0530d228-d54d-11e2-bcd8-52540035b04c |
| Comment by Keith Mannthey (Inactive) [ 26/Jun/13 ] |
|
https://maloo.whamcloud.com/test_sets/652535cc-ddfc-11e2-a20c-52540035b04c lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. |
| Comment by Bruno Faccini (Inactive) [ 04/Jul/13 ] |
|
https://maloo.whamcloud.com/test_sets/d9ef75f0-e416-11e2-8f78-52540035b04c : lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. |
| Comment by Bob Glossman (Inactive) [ 12/Aug/13 ] |
|
another: https://maloo.whamcloud.com/test_sets/bcadb2aa-035a-11e3-9f24-52540035b04c lustre-rsync-test test_2b: @@@@@@ FAIL: Failure in replication; differences found. |
| Comment by Bob Glossman (Inactive) [ 13/Aug/13 ] |
|
another: https://maloo.whamcloud.com/test_sets/2207a0c8-0439-11e3-a8e9-52540035b04c |
| Comment by Bruno Faccini (Inactive) [ 28/Aug/13 ] |
|
+1 at https://maloo.whamcloud.com/test_sets/35af817e-0f54-11e3-9bce-52540035b04c |
| Comment by Jian Yu [ 04/Sep/13 ] |
|
Lustre client: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) lustre-rsync-test test 2b failed: |
| Comment by Bob Glossman (Inactive) [ 16/Sep/13 ] |
|
another |
| Comment by Nathaniel Clark [ 10/Oct/13 ] |
|
The spate of ZFS failures seem to be related to dbench not being started at the beginning of test 2b within the given 20s. Here is a patch to wait longer if necessary: |
| Comment by Andreas Dilger [ 30/Oct/13 ] |
|
It seems this bug has been subverted from its original purpose of tracking a 2.1/2.4 interop problem into something unrelated that also causes test_2b to fail (dbench not starting quickly enough). It would be better to fix that problem in a separate bug, so that when the patch lands that bug can be closed, and this one is not closed. |
| Comment by Jian Yu [ 03/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/59/ The same failure occurred: |
| Comment by Jian Yu [ 13/Dec/13 ] |
|
More instance on Lustre b2_4 branch: |
| Comment by Jian Yu [ 08/Jan/14 ] |
|
An instance on Lustre b2_5 branch: |
| Comment by Bob Glossman (Inactive) [ 15/Jan/14 ] |
|
an instance in master: |
| Comment by Jian Yu [ 07/Feb/14 ] |
|
More instances on Lustre b2_5 branch: |
| Comment by Bruno Faccini (Inactive) [ 28/Feb/14 ] |
|
+1 on b2_5 branch : https://maloo.whamcloud.com/test_sessions/26ab637c-9b91-11e3-95f0-52540035b04c |
| Comment by Andreas Dilger [ 18/Mar/14 ] |
|
I'm using |