[LU-3310] Interop 2.1.5<->2.4 failure on test suite sanity test_17o: stat file should fail Created: 10/May/13  Updated: 10/Aug/13  Resolved: 10/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: 2.1.5
client: lustre-master build# 1481


Severity: 3
Rank (Obsolete): 8197

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/a00f384c-b88e-11e2-891d-52540035b04c.

The sub-test test_17o failed with the following error:

stat file should fail

client dmesg:

Lustre: DEBUG MARKER: == sanity test 17o: stat file with incompat LMA feature == 01:16:21 (1368000981)
LustreError: 11-0: lustre-MDT0000-mdc-ffff88007cc28400: Communicating with 10.10.4.122@tcp, operation ldlm_enqueue failed with -107.
Lustre: lustre-MDT0000-mdc-ffff88007cc28400: Connection to lustre-MDT0000 (at 10.10.4.122@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 8637:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1368000984/real 1368000984]  req@ffff88006e13ec00 x1434452628717600/t0(0) o400->MGC10.10.4.122@tcp@10.10.4.122@tcp:26/25 lens 224/224 e 0 to 1 dl 1368000991 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 166-1: MGC10.10.4.122@tcp: Connection to MGS (at 10.10.4.122@tcp) was lost; in progress operations using this service will fail
Lustre: Evicted from MGS (at 10.10.4.122@tcp) after server handle changed from 0xe26621f21f318fc6 to 0xe26621f21f31920b
Lustre: MGC10.10.4.122@tcp: Connection restored to MGS (at 10.10.4.122@tcp)
LustreError: 8635:0:(client.c:2696:ptlrpc_replay_interpret()) @@@ status -116, old was 0  req@ffff88007a953c00 x1434452628717596/t17179869193(17179869193) o35->lustre-MDT0000-mdc-ffff88007cc28400@10.10.4.122@tcp:23/10 lens 392/424 e 0 to 0 dl 1368000997 ref 2 fl Interpret:R/4/0 rc -116/-116
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_17o: @@@@@@ FAIL: stat file should fail 
Lustre: DEBUG MARKER: sanity test_17o: @@@@@@ FAIL: stat file should fail
Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-05-08/lustre-b2_1-el6-x86_64-vs-lustre-master-el6-x86_64--full--1_4_1__1481__-70235156306540-003534/sanity.test_17o.debug_log.$(hostname -s).1368000991.log;
         dmesg > /logdir/test_logs/2013-05-08/lustre-b2

OST console:

01:16:23:Lustre: DEBUG MARKER: == sanity test 17o: stat file with incompat LMA feature == 01:16:21 (1368000981)
01:16:34:Lustre: lustre-OST0000: Received new MDS connection from 10.10.4.122@tcp, removing former export from same NID
01:16:34:Lustre: Skipped 5 previous similar messages
01:16:34:LustreError: 11-0: an error occurred while communicating with 10.10.4.122@tcp. The obd_ping operation failed with -107
01:16:34:LustreError: 166-1: MGC10.10.4.122@tcp: Connection to MGS (at 10.10.4.122@tcp) was lost; in progress operations using this service will fail
01:16:34:LustreError: 24007:0:(client.c:1060:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff88007aced400 x1434452657963472/t0(0) o101->MGC10.10.4.122@tcp@10.10.4.122@tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
01:16:34:LustreError: 24007:0:(client.c:1060:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff88007aced400 x1434452657963473/t0(0) o101->MGC10.10.4.122@tcp@10.10.4.122@tcp:26/25 lens 296/352 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
01:16:34:Lustre: Evicted from MGS (at MGC10.10.4.122@tcp_0) after server handle changed from 0xe26621f21f318ff0 to 0xe26621f21f3191f6
01:16:34:Lustre: MGC10.10.4.122@tcp: Reactivating import
01:16:34:Lustre: MGC10.10.4.122@tcp: Connection restored to MGS (at 10.10.4.122@tcp)
01:16:34:Lustre: 14527:0:(lustre_log.h:476:llog_group_set_export()) lustre-OST0002: export for group 0 is changed: 0xffff88007bcaac00 -> 0xffff88007ab82400
01:16:34:Lustre: 14527:0:(lustre_log.h:476:llog_group_set_export()) Skipped 12 previous similar messages
01:16:34:Lustre: 14527:0:(llog_net.c:168:llog_receptor_accept()) changing the import ffff880067f80000 - ffff88007a285800
01:16:34:Lustre: 14527:0:(llog_net.c:168:llog_receptor_accept()) Skipped 12 previous similar messages
01:16:34:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_17o: @@@@@@ FAIL: stat file should fail 
01:16:34:Lustre: DEBUG MARKER: sanity test_17o: @@@@@@ FAIL: stat file should fail

MDS console

01:16:25:Lustre: DEBUG MARKER: == sanity test 17o: stat file with incompat LMA feature == 01:16:21 (1368000981)
01:16:25:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
01:16:25:Lustre: DEBUG MARKER: umount -d /mnt/mds1
01:16:25:Lustre: Failing over lustre-MDT0000
01:16:25:Lustre: 14221:0:(quota_master.c:795:close_quota_files()) quota[0] is off already
01:16:25:Lustre: 14221:0:(quota_master.c:795:close_quota_files()) Skipped 1 previous similar message
01:16:25:Lustre: Failing over mdd_obd-lustre-MDT0000
01:16:25:Lustre: mdd_obd-lustre-MDT0000: shutting down for failover; client state will be preserved.
01:16:25:Lustre: MGS has stopped.
01:16:25:LustreError: 14221:0:(ldlm_request.c:1174:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
01:16:25:LustreError: 14221:0:(ldlm_request.c:1801:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
01:16:37:Lustre: 14221:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1368000982/real 1368000982]  req@ffff880063e5f000 x1434452653771035/t0(0) o251->MGC10.10.4.122@tcp@0@lo:26/25 lens 192/192 e 0 to 1 dl 1368000988 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
01:16:37:Lustre: server umount lustre-MDT0000 complete
01:16:37:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
01:16:37:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
01:16:37:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
01:16:37:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
01:16:37:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
01:16:37:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
01:16:37:Lustre: MGS MGS started
01:16:37:Lustre: MGC10.10.4.122@tcp: Reactivating import
01:16:37:Lustre: Enabling ACL
01:16:37:Lustre: Enabling user_xattr
01:16:37:Lustre: lustre-MDT0000: used disk, loading
01:16:37:Lustre: 14436:0:(ldlm_lib.c:2028:target_recovery_init()) RECOVERY: service lustre-MDT0000, 2 recoverable clients, last_transno 17179869192
01:16:37:Lustre: 14436:0:(mdt_lproc.c:416:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
01:16:37:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
01:16:37:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
01:16:37:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x194
01:16:37:LustreError: 14442:0:(mdt_handler.c:2817:mdt_recovery()) operation 101 on unconnected MDS from 12345-10.10.4.121@tcp
01:16:37:Lustre: lustre-MDT0000: sending delayed replies to recovered clients


 Comments   
Comment by Jodi Levi (Inactive) [ 10/May/13 ]

BobiJam,
Could you please have a look at this one?
Thank you!

Comment by Zhenyu Xu [ 11/May/13 ]

2.1.5 server does not have this LMA incompat check, sanity test_17o will sure fail upon it. Can we exclude this test case for 2.4 client<-> older server interop test?

Comment by Jian Yu [ 12/Jul/13 ]

Can we exclude this test case for 2.4 client<-> older server interop test?

Yes, please add Lustre version check codes on master branch. The patch also needs to be cherry-picked to Lustre b2_4 branch.

Comment by Jian Yu [ 17/Jul/13 ]

Patch for Lustre master branch: http://review.whamcloud.com/7017. The patch is also needed to be cherry-picked to Lustre b2_4 branch.

Comment by Jian Yu [ 10/Aug/13 ]

Patch was landed on Lustre b2_4 and master branches.

Generated at Sat Feb 10 01:32:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.