[LU-891] 1.8<->2.2 interop Test failure on test suite replay-vbr:setattr of UID succeeded unexpectedly Created: 02/Dec/11 Updated: 27/Feb/13 Resolved: 20/Apr/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | Lustre 1.8.8 |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6511 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/18cfb3c6-1591-11e1-b189-52540025f9af. |
| Comments |
| Comment by Oleg Drokin [ 03/Jan/12 ] |
|
The problem at hand is that 1.8 client failed to reconnect to the server. |
| Comment by Sarah Liu [ 30/Jan/12 ] |
|
Hit the similar issue on test_0e, unfortunately lack of server's logs. I will try to reproduce it and see if I can get more information. https://maloo.whamcloud.com/test_sets/f3376fc4-4b0e-11e1-915b-5254004bbbd3 |
| Comment by Sarah Liu [ 30/Jan/12 ] |
|
debug and dmesg log from MDS and OST |
| Comment by Peter Jones [ 09/Feb/12 ] |
|
Yangsheng will look into this one |
| Comment by Yang Sheng [ 16/Feb/12 ] |
|
Hi, Sarah, looks like the attachment just contain mds & ost log. Could you also upload client log please? From the MDS log found the request already failed by version mismatch. 00000004:00000001:12.0:1327950497.962291:0:8347:0:(mdt_open.c:1206:mdt_reint_open()) Process entered 00000004:00000002:12.0:1327950497.962297:0:8347:0:(mdt_open.c:1240:mdt_reint_open()) I am going to open [0x200000400:0x3:0x0]/(f0b->[0x200000400:0x4:0x0]) cr_flag=0103 mode=0100644 msg_flag=0x4 00000004:00000001:12.0:1327950497.962304:0:8347:0:(mdt_open.c:1010:mdt_open_by_fid()) Process entered 00000004:00000001:12.0:1327950497.962306:0:8347:0:(mdt_handler.c:2082:mdt_object_find()) Process entered 00000004:00000040:12.0:1327950497.962308:0:8347:0:(mdt_handler.c:2084:mdt_object_find()) Find object for [0x200000400:0x4:0x0] 00000020:00000001:12.0:1327950497.962313:0:8347:0:(lustre_fid.h:402:fid_flatten32()) Process leaving (rc=4194308 : 4194308 : 400004) ........ 00000004:00000002:12.0:1327950497.988319:0:8347:0:(osd_handler.c:2315:osd_object_version_get()) Get version 0x10000000c for inode 522885 00000004:00000002:12.0:1327950497.988322:0:8347:0:(mdt_reint.c:117:mdt_obj_version_get()) FID [0x200000400:0x3:0x0] version is 0x10000000c 00000004:00000001:12.0:1327950497.988327:0:8347:0:(mdt_reint.c:129:mdt_version_check()) Process entered 00000004:00000002:12.0:1327950497.988329:0:8347:0:(mdt_reint.c:146:mdt_version_check()) Version mismatch 0x10000000d != 0x10000000c 00000004:00000001:12.0:1327950497.988332:0:8347:0:(mdt_reint.c:150:mdt_version_check()) Process leaving (rc=18446744073709551541 : -75 : ffffffffffffffb5) 00000004:00000001:12.0:1327950497.988335:0:8347:0:(mdt_open.c:1296:mdt_reint_open()) Process leaving via out_parent (rc=18446744073709551541 : -75 : 0xffffffffffffffb5) |
| Comment by Sarah Liu [ 16/Feb/12 ] |
|
Hi Yang Sheng, for the client log, you can find it from the Maloo link above.I only uploaded the server logs since they were missing in the report. https://maloo.whamcloud.com/test_sets/f3376fc4-4b0e-11e1-915b-5254004bbbd3 |
| Comment by Yang Sheng [ 19/Feb/12 ] |
|
Hi, Sarah, The logs aren't come from same test failed. Looks like client already failed on getattr call. Anyway, I'll trying to reproduce it in my local VMs. 00000080:00000001:0:1322008320.334549:0:12107:0:(namei.c:704:ll_lookup_nd()) Process entered .......... 00000002:00000001:0:1322008320.334829:0:12107:0:(mdc_locks.c:658:mdc_enqueue()) Process leaving (rc=18446744073709551508 : -108 : ffffffffffffff94) |
| Comment by Yang Sheng [ 23/Feb/12 ] |
|
From client log we can found: 00000100:00080000:2:1322008320.383168:0:12103:0:(import.c:1224:ptlrpc_invalidate_import_thread()) ffff880315836000 lustre-MDT0000_UUID: changing import state from EVICTED to RECOVER 00000100:00000001:2:1322008320.383171:0:12103:0:(import.c:1239:ptlrpc_import_recovery_state_machine()) Process entered 00000100:00080000:2:1322008320.383172:0:12103:0:(import.c:1314:ptlrpc_import_recovery_state_machine()) reconnected to lustre-MDT0000_UUID@192.168.4.2@o2ib 00000100:00000001:2:1322008320.383174:0:12103:0:(recover.c:148:ptlrpc_resend()) Process entered 00000100:00000001:2:1322008320.383175:0:12103:0:(recover.c:171:ptlrpc_resend()) Process leaving (rc=0 : 0 : 0) 00000100:00080000:2:1322008320.383177:0:12103:0:(import.c:1319:ptlrpc_import_recovery_state_machine()) ffff880315836000 lustre-MDT0000_UUID: changing import state from RECOVER to FULL This indicate client reconnect to mds finally. But checkstat run before, so it failed on -108. Looks like we may consider run checkstat after waiting MDS recovery completed. |
| Comment by Yang Sheng [ 27/Feb/12 ] |
|
Hi, Peter, I think this test failed by client test script. Since it is a interop test, So we need change the 1.8 code to fixed. So please reconsider whether it can be a 2.x blocker. |
| Comment by Peter Jones [ 27/Feb/12 ] |
|
Thanks YangSheng. Oleg and Andreas both agree to drop this as a 2.2 blocker |
| Comment by Yang Sheng [ 05/Mar/12 ] |
|
Upload patch to http://review.whamcloud.com/2248 |
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 20/Apr/12 ] |
|
Landed for 1.8.8 |
| Comment by Build Master (Inactive) [ 20/Apr/12 ] |
|
Integrated in Result = SUCCESS
|