[LU-891] 1.8<->2.2 interop Test failure on test suite replay-vbr:setattr of UID succeeded unexpectedly Created: 02/Dec/11  Updated: 27/Feb/13  Resolved: 20/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: Lustre 1.8.8

Type: Bug Priority: Major
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 891.tar.gz    
Issue Links:
Related
is related to LU-951 Test failure on test suite replay-sin... Resolved
Severity: 3
Rank (Obsolete): 6511

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/18cfb3c6-1591-11e1-b189-52540025f9af.



 Comments   
Comment by Oleg Drokin [ 03/Jan/12 ]

The problem at hand is that 1.8 client failed to reconnect to the server.
Unfortunately the only logs are from client side too.

Comment by Sarah Liu [ 30/Jan/12 ]

Hit the similar issue on test_0e, unfortunately lack of server's logs. I will try to reproduce it and see if I can get more information.

https://maloo.whamcloud.com/test_sets/f3376fc4-4b0e-11e1-915b-5254004bbbd3

Comment by Sarah Liu [ 30/Jan/12 ]

debug and dmesg log from MDS and OST

Comment by Peter Jones [ 09/Feb/12 ]

Yangsheng will look into this one

Comment by Yang Sheng [ 16/Feb/12 ]

Hi, Sarah, looks like the attachment just contain mds & ost log. Could you also upload client log please?

From the MDS log found the request already failed by version mismatch.

00000004:00000001:12.0:1327950497.962291:0:8347:0:(mdt_open.c:1206:mdt_reint_open()) Process entered
00000004:00000002:12.0:1327950497.962297:0:8347:0:(mdt_open.c:1240:mdt_reint_open()) I am going to open [0x200000400:0x3:0x0]/(f0b->[0x200000400:0x4:0x0]) cr_flag=0103 mode=0100644 msg_flag=0x4
00000004:00000001:12.0:1327950497.962304:0:8347:0:(mdt_open.c:1010:mdt_open_by_fid()) Process entered
00000004:00000001:12.0:1327950497.962306:0:8347:0:(mdt_handler.c:2082:mdt_object_find()) Process entered
00000004:00000040:12.0:1327950497.962308:0:8347:0:(mdt_handler.c:2084:mdt_object_find()) Find object for [0x200000400:0x4:0x0]
00000020:00000001:12.0:1327950497.962313:0:8347:0:(lustre_fid.h:402:fid_flatten32()) Process leaving (rc=4194308 : 4194308 : 400004)

........

00000004:00000002:12.0:1327950497.988319:0:8347:0:(osd_handler.c:2315:osd_object_version_get()) Get version 0x10000000c for inode 522885
00000004:00000002:12.0:1327950497.988322:0:8347:0:(mdt_reint.c:117:mdt_obj_version_get()) FID [0x200000400:0x3:0x0] version is 0x10000000c
00000004:00000001:12.0:1327950497.988327:0:8347:0:(mdt_reint.c:129:mdt_version_check()) Process entered
00000004:00000002:12.0:1327950497.988329:0:8347:0:(mdt_reint.c:146:mdt_version_check()) Version mismatch 0x10000000d != 0x10000000c
00000004:00000001:12.0:1327950497.988332:0:8347:0:(mdt_reint.c:150:mdt_version_check()) Process leaving (rc=18446744073709551541 : -75 : ffffffffffffffb5)
00000004:00000001:12.0:1327950497.988335:0:8347:0:(mdt_open.c:1296:mdt_reint_open()) Process leaving via out_parent (rc=18446744073709551541 : -75 : 0xffffffffffffffb5)

Comment by Sarah Liu [ 16/Feb/12 ]

Hi Yang Sheng, for the client log, you can find it from the Maloo link above.I only uploaded the server logs since they were missing in the report.

https://maloo.whamcloud.com/test_sets/f3376fc4-4b0e-11e1-915b-5254004bbbd3

Comment by Yang Sheng [ 19/Feb/12 ]

Hi, Sarah, The logs aren't come from same test failed. Looks like client already failed on getattr call. Anyway, I'll trying to reproduce it in my local VMs.

00000080:00000001:0:1322008320.334549:0:12107:0:(namei.c:704:ll_lookup_nd()) Process entered
00000080:00000010:0:1322008320.334551:0:12107:0:(namei.c:675:ll_convert_intent()) kmalloced 'it': 48 at ffff880314367980.
00000080:00000001:0:1322008320.334552:0:12107:0:(namei.c:584:ll_lookup_it()) Process entered
00000080:00200000:0:1322008320.334553:0:12107:0:(namei.c:588:ll_lookup_it()) VFS Op:name=f0j,dir=2614401/1909664258(ffff880316096950),intent=getattr
00000002:00000001:0:1322008320.334556:0:12107:0:(mdc_locks.c:837:mdc_intent_lock()) Process entered
00000002:00010000:0:1322008320.334557:0:12107:0:(mdc_locks.c:846:mdc_intent_lock()) name: f0j([0x0:0x0:0x0]) in inode ([0x27e481:0x71d32a02:0x0]), intent: getattr flags 00
00000002:00000001:0:1322008320.334560:0:12107:0:(mdc_locks.c:607:mdc_enqueue()) Process entered

..........

00000002:00000001:0:1322008320.334829:0:12107:0:(mdc_locks.c:658:mdc_enqueue()) Process leaving (rc=18446744073709551508 : -108 : ffffffffffffff94)
00000002:00000001:0:1322008320.334830:0:12107:0:(mdc_locks.c:874:mdc_intent_lock()) Process leaving (rc=18446744073709551508 : -108 : ffffffffffffff94)
00000080:00000001:0:1322008320.334832:0:12107:0:(namei.c:627:ll_lookup_it()) Process leaving via out (rc=18446744073709551508 : -108 : ffffffffffffff94)
00000080:00000001:0:1322008320.334834:0:12107:0:(dcache.c:214:ll_intent_release()) Process entered
00000080:00000001:0:1322008320.334835:0:12107:0:(dcache.c:232:ll_intent_release()) Process leaving
00000080:00000010:0:1322008320.334836:0:12107:0:(namei.c:784:ll_lookup_nd()) kfreed 'it': 48 at ffff880314367980.
00000080:00000001:0:1322008320.334837:0:12107:0:(namei.c:790:ll_lookup_nd()) Process leaving (rc=18446744073709551508 : -108 : ffffffffffffff94)
00000100:00000001:1:1322008320.382455:0:2391:0:(import.c:1145:ptlrpc_connect_interpret()) Process leaving (rc=0 : 0 : 0)
00000100:00000040:1:1322008320.382459:0:2391:0:(lustre_net.h:1023:ptlrpc_rqphase_move()) @@@ move req "Interpret" > "Complete" req@ffff880315d71400 x1386221492776041/t0 o38>lustre-MDT0000_UUID@192.168.4.2@o2ib:12/10 lens 368/584 e 0 to 1 dl 1322008327 ref 1 fl Interpret:REN/0/0 rc 0/0
00000100:00100000:1:1322008320.382468:0:2391:0:(client.c:1452:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd-recov:478ea2b0-1ee6-2ad9-50e6-66d1b78c537b:2391:x1386221492776041:192.168.4.2@o2ib:38

Comment by Yang Sheng [ 23/Feb/12 ]

From client log we can found:

00000100:00080000:2:1322008320.383168:0:12103:0:(import.c:1224:ptlrpc_invalidate_import_thread()) ffff880315836000 lustre-MDT0000_UUID: changing import state from EVICTED to RECOVER
00000100:00000001:2:1322008320.383171:0:12103:0:(import.c:1239:ptlrpc_import_recovery_state_machine()) Process entered
00000100:00080000:2:1322008320.383172:0:12103:0:(import.c:1314:ptlrpc_import_recovery_state_machine()) reconnected to lustre-MDT0000_UUID@192.168.4.2@o2ib
00000100:00000001:2:1322008320.383174:0:12103:0:(recover.c:148:ptlrpc_resend()) Process entered
00000100:00000001:2:1322008320.383175:0:12103:0:(recover.c:171:ptlrpc_resend()) Process leaving (rc=0 : 0 : 0)
00000100:00080000:2:1322008320.383177:0:12103:0:(import.c:1319:ptlrpc_import_recovery_state_machine()) ffff880315836000 lustre-MDT0000_UUID: changing import state from RECOVER to FULL

This indicate client reconnect to mds finally. But checkstat run before, so it failed on -108. Looks like we may consider run checkstat after waiting MDS recovery completed.

Comment by Yang Sheng [ 27/Feb/12 ]

Hi, Peter, I think this test failed by client test script. Since it is a interop test, So we need change the 1.8 code to fixed. So please reconsider whether it can be a 2.x blocker.

Comment by Peter Jones [ 27/Feb/12 ]

Thanks YangSheng. Oleg and Andreas both agree to drop this as a 2.2 blocker

Comment by Yang Sheng [ 05/Mar/12 ]

Upload patch to http://review.whamcloud.com/2248

Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Comment by Peter Jones [ 20/Apr/12 ]

Landed for 1.8.8

Comment by Build Master (Inactive) [ 20/Apr/12 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #186
LU-891 test: waiting import state for next step. (Revision 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a)

Result = SUCCESS
Johann Lombardi : 4d59b9eac6b961d89c4f0e8ce9f342bc01c58f4a
Files :

  • lustre/tests/replay-vbr.sh
Generated at Sat Feb 10 01:11:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.