[LU-3360] Interop 2.1.5<-> 2.4 failure on test suite runtests: Stale file handle Created: 20/May/13  Updated: 16/Oct/13  Resolved: 31/May/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.4.0, Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

server: 2.1.5
client: tag-2.4.0-RC1


Severity: 3
Rank (Obsolete): 8322

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/335f5472-bf03-11e2-88e0-52540035b04c.

client-16vm3: debug=0x33f0404
client-16vm4: debug=0x33f0404
client-16vm3: subsystem_debug=0xffb7e3ff
client-16vm3: debug_mb=32
client-16vm4: subsystem_debug=0xffb7e3ff
client-16vm4: debug_mb=32
touching /mnt/lustre at Thu May 16 10:37:20 PDT 2013
create an empty file /mnt/lustre/hosts.12774
copying /etc/hosts to /mnt/lustre/hosts.12774
cp: cannot create regular file `/mnt/lustre/hosts.12774': Stale file handle


 Comments   
Comment by Sarah Liu [ 20/May/13 ]

another failure in sanity.sh
https://maloo.whamcloud.com/test_sets/34048384-bf03-11e2-88e0-52540035b04c

Comment by Peter Jones [ 21/May/13 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Sarah Liu [ 21/May/13 ]

sanity-benchmark test_dbench hit similar error:

https://maloo.whamcloud.com/test_sets/34ce8152-bf03-11e2-88e0-52540035b04c
OST console shows:

10:37:48:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867)
10:37:48:LustreError: 4924:0:(filter.c:1484:filter_fid2dentry()) fatal: invalid object id 0
10:37:48:LustreError: 4924:0:(filter.c:3129:__filter_oa2dentry()) filter_setattr error looking up object: 0:2
10:37:48:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 

client console shows:

10:37:55:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867)
10:37:55:LustreError: 11-0: lustre-OST0005-osc-ffff88007ace2800: Communicating with 10.10.4.123@tcp, operation ost_destroy failed with -71.
10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116
10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116
10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 
10:37:55:Lustre: DEBUG MARKER: sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed!
10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-05-16/lustre-b2_1-el6-x86_64-vs-lustre-master-el6-x86_64--full--1_4_1__1501__-70235162192180-100105/sanity-benchmark.test_dbench.debug_log.$(hostname -s).1368725868.log;
10:37:55:         dmesg > /logdir/test_logs/2013-05
Comment by Hongchao Zhang [ 22/May/13 ]

it could be related to the patch in LU-3187, http://review.whamcloud.com/#change,6287

static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
                                        struct obdo *wobdo, struct obdo *lobdo)
{
        memcpy(wobdo, lobdo, sizeof(*lobdo));
        wobdo->o_flags &= ~OBD_FL_LOCAL_MASK;
        if (ocd == NULL)
                return;

        if (unlikely(!(ocd->ocd_connect_flags & OBD_CONNECT_FID)) &&
            fid_seq_is_echo(fid_seq(&lobdo->o_oi.oi_fid))) {
                /* Currently OBD_FL_OSTID will only be used when 2.4 echo
                 * client communicate with pre-2.4 server */
                wobdo->o_oi.oi.oi_id = fid_oid(&lobdo->o_oi.oi_fid);
                wobdo->o_oi.oi.oi_seq = fid_seq(&lobdo->o_oi.oi_fid);
        }
}

if the group (oi_seq) is 0 and the id (oi_id) is 2, then the obdo sent to OST will be changed to group(oi_seq)=fid_seq(&lobdo->o_oi.oi_fid)=2,
id(oi_id)=fid_oid(&lobdo->o_oi.oi_fid)=0.

Comment by James Nunez (Inactive) [ 22/May/13 ]

I'm seeing something similar with sanityn, but no OST nor client logs to look at:

https://maloo.whamcloud.com/test_sets/313058ec-c294-11e2-b2eb-52540035b04c
https://maloo.whamcloud.com/test_sets/71dde390-c2b8-11e2-b2eb-52540035b04c
https://maloo.whamcloud.com/test_sets/9a36508a-c1d6-11e2-ada8-52540035b04c

Comment by Hongchao Zhang [ 23/May/13 ]

the patch is tracked at http://review.whamcloud.com/#change,6426

Comment by James Nunez (Inactive) [ 24/May/13 ]

Ran with the 6426 patch and interop testing. Tests ran with some subtest failures; some known, but still going through failures. The patch allows runtests to run with no stale file handle error.

2.1 clients with 6426 patched master servers:
https://maloo.whamcloud.com/test_sessions/a0a1ca60-c485-11e2-ac71-52540035b04c

6426 patched master clients with 2.1 servers:
https://maloo.whamcloud.com/test_sessions/d7b668fa-c483-11e2-ac71-52540035b04c

Comment by Jodi Levi (Inactive) [ 31/May/13 ]

Patch landed to master. Let me know if more patches are needed and I will reopen the ticket.

Generated at Sat Feb 10 01:33:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.