Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3360

Interop 2.1.5<-> 2.4 failure on test suite runtests: Stale file handle

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0, Lustre 2.5.0
    • None
    • None
    • server: 2.1.5
      client: tag-2.4.0-RC1
    • 3
    • 8322

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/335f5472-bf03-11e2-88e0-52540035b04c.

      client-16vm3: debug=0x33f0404
      client-16vm4: debug=0x33f0404
      client-16vm3: subsystem_debug=0xffb7e3ff
      client-16vm3: debug_mb=32
      client-16vm4: subsystem_debug=0xffb7e3ff
      client-16vm4: debug_mb=32
      touching /mnt/lustre at Thu May 16 10:37:20 PDT 2013
      create an empty file /mnt/lustre/hosts.12774
      copying /etc/hosts to /mnt/lustre/hosts.12774
      cp: cannot create regular file `/mnt/lustre/hosts.12774': Stale file handle
      

      Attachments

        Activity

          [LU-3360] Interop 2.1.5<-> 2.4 failure on test suite runtests: Stale file handle

          Patch landed to master. Let me know if more patches are needed and I will reopen the ticket.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to master. Let me know if more patches are needed and I will reopen the ticket.

          Ran with the 6426 patch and interop testing. Tests ran with some subtest failures; some known, but still going through failures. The patch allows runtests to run with no stale file handle error.

          2.1 clients with 6426 patched master servers:
          https://maloo.whamcloud.com/test_sessions/a0a1ca60-c485-11e2-ac71-52540035b04c

          6426 patched master clients with 2.1 servers:
          https://maloo.whamcloud.com/test_sessions/d7b668fa-c483-11e2-ac71-52540035b04c

          jamesanunez James Nunez (Inactive) added a comment - Ran with the 6426 patch and interop testing. Tests ran with some subtest failures; some known, but still going through failures. The patch allows runtests to run with no stale file handle error. 2.1 clients with 6426 patched master servers: https://maloo.whamcloud.com/test_sessions/a0a1ca60-c485-11e2-ac71-52540035b04c 6426 patched master clients with 2.1 servers: https://maloo.whamcloud.com/test_sessions/d7b668fa-c483-11e2-ac71-52540035b04c
          hongchao.zhang Hongchao Zhang added a comment - the patch is tracked at http://review.whamcloud.com/#change,6426
          jamesanunez James Nunez (Inactive) added a comment - - edited I'm seeing something similar with sanityn, but no OST nor client logs to look at: https://maloo.whamcloud.com/test_sets/313058ec-c294-11e2-b2eb-52540035b04c https://maloo.whamcloud.com/test_sets/71dde390-c2b8-11e2-b2eb-52540035b04c https://maloo.whamcloud.com/test_sets/9a36508a-c1d6-11e2-ada8-52540035b04c
          hongchao.zhang Hongchao Zhang added a comment - - edited

          it could be related to the patch in LU-3187, http://review.whamcloud.com/#change,6287

          static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd,
                                                  struct obdo *wobdo, struct obdo *lobdo)
          {
                  memcpy(wobdo, lobdo, sizeof(*lobdo));
                  wobdo->o_flags &= ~OBD_FL_LOCAL_MASK;
                  if (ocd == NULL)
                          return;
          
                  if (unlikely(!(ocd->ocd_connect_flags & OBD_CONNECT_FID)) &&
                      fid_seq_is_echo(fid_seq(&lobdo->o_oi.oi_fid))) {
                          /* Currently OBD_FL_OSTID will only be used when 2.4 echo
                           * client communicate with pre-2.4 server */
                          wobdo->o_oi.oi.oi_id = fid_oid(&lobdo->o_oi.oi_fid);
                          wobdo->o_oi.oi.oi_seq = fid_seq(&lobdo->o_oi.oi_fid);
                  }
          }
          

          if the group (oi_seq) is 0 and the id (oi_id) is 2, then the obdo sent to OST will be changed to group(oi_seq)=fid_seq(&lobdo->o_oi.oi_fid)=2,
          id(oi_id)=fid_oid(&lobdo->o_oi.oi_fid)=0.

          hongchao.zhang Hongchao Zhang added a comment - - edited it could be related to the patch in LU-3187 , http://review.whamcloud.com/#change,6287 static inline void lustre_set_wire_obdo(struct obd_connect_data *ocd, struct obdo *wobdo, struct obdo *lobdo) { memcpy(wobdo, lobdo, sizeof(*lobdo)); wobdo->o_flags &= ~OBD_FL_LOCAL_MASK; if (ocd == NULL) return ; if (unlikely(!(ocd->ocd_connect_flags & OBD_CONNECT_FID)) && fid_seq_is_echo(fid_seq(&lobdo->o_oi.oi_fid))) { /* Currently OBD_FL_OSTID will only be used when 2.4 echo * client communicate with pre-2.4 server */ wobdo->o_oi.oi.oi_id = fid_oid(&lobdo->o_oi.oi_fid); wobdo->o_oi.oi.oi_seq = fid_seq(&lobdo->o_oi.oi_fid); } } if the group (oi_seq) is 0 and the id (oi_id) is 2, then the obdo sent to OST will be changed to group(oi_seq)=fid_seq(&lobdo->o_oi.oi_fid)=2, id(oi_id)=fid_oid(&lobdo->o_oi.oi_fid)=0.
          sarah Sarah Liu added a comment -

          sanity-benchmark test_dbench hit similar error:

          https://maloo.whamcloud.com/test_sets/34ce8152-bf03-11e2-88e0-52540035b04c
          OST console shows:

          10:37:48:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867)
          10:37:48:LustreError: 4924:0:(filter.c:1484:filter_fid2dentry()) fatal: invalid object id 0
          10:37:48:LustreError: 4924:0:(filter.c:3129:__filter_oa2dentry()) filter_setattr error looking up object: 0:2
          10:37:48:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 
          

          client console shows:

          10:37:55:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867)
          10:37:55:LustreError: 11-0: lustre-OST0005-osc-ffff88007ace2800: Communicating with 10.10.4.123@tcp, operation ost_destroy failed with -71.
          10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116
          10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116
          10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 
          10:37:55:Lustre: DEBUG MARKER: sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed!
          10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-05-16/lustre-b2_1-el6-x86_64-vs-lustre-master-el6-x86_64--full--1_4_1__1501__-70235162192180-100105/sanity-benchmark.test_dbench.debug_log.$(hostname -s).1368725868.log;
          10:37:55:         dmesg > /logdir/test_logs/2013-05
          
          sarah Sarah Liu added a comment - sanity-benchmark test_dbench hit similar error: https://maloo.whamcloud.com/test_sets/34ce8152-bf03-11e2-88e0-52540035b04c OST console shows: 10:37:48:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867) 10:37:48:LustreError: 4924:0:(filter.c:1484:filter_fid2dentry()) fatal: invalid object id 0 10:37:48:LustreError: 4924:0:(filter.c:3129:__filter_oa2dentry()) filter_setattr error looking up object: 0:2 10:37:48:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! client console shows: 10:37:55:Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench == 10:37:47 (1368725867) 10:37:55:LustreError: 11-0: lustre-OST0005-osc-ffff88007ace2800: Communicating with 10.10.4.123@tcp, operation ost_destroy failed with -71. 10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116 10:37:55:LustreError: 22071:0:(vvp_io.c:1086:vvp_io_commit_write()) Write page 512 of inode ffff88007c71cb38 failed -116 10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 10:37:55:Lustre: DEBUG MARKER: sanity-benchmark test_dbench: @@@@@@ FAIL: dbench failed! 10:37:55:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-05-16/lustre-b2_1-el6-x86_64-vs-lustre-master-el6-x86_64--full--1_4_1__1501__-70235162192180-100105/sanity-benchmark.test_dbench.debug_log.$(hostname -s).1368725868.log; 10:37:55: dmesg > /logdir/test_logs/2013-05
          pjones Peter Jones added a comment -

          Hongchao

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Hongchao Could you please look into this one? Thanks Peter
          sarah Sarah Liu added a comment - another failure in sanity.sh https://maloo.whamcloud.com/test_sets/34048384-bf03-11e2-88e0-52540035b04c

          People

            hongchao.zhang Hongchao Zhang
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: