Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-565

1.8<->2.1 interop: Stale NFS file handle

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0, Lustre 1.8.7
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 3
    • 5186

    Description

      While running compilebench on NFSv3 client (NFSv3 server was Lustre 1.8.6-wc1 client, and the Lustre server version was v2_0_66_0), it failed as follows:

      NFSCLIENT mode: setup, cleanup, check config skipped
      client-13-ib
      fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0
      client-12-ib
      fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0
      fat-amd-3-ib
      Filesystem    Type   1K-blocks      Used Available Use% Mounted on
      fat-amd-1-ib@o2ib:/lustre
                  lustre    11811168    615432  10595352   6% /mnt/lustre
      
      == test compilebench: compilebench == 23:10:50
      OPTIONS:
      cbench_DIR=/usr/bin
      cbench_IDIRS=10
      cbench_RUNS=10
      client-12-ib
      client-13-ib
      ./compilebench -D /mnt/lustre/d0.compilebench -i 10 -r 10 --makej
      using working directory /mnt/lustre/d0.compilebench, 10 intial dirs 10 runs
      Traceback (most recent call last):
        File "./compilebench", line 567, in <module>
          dset = dataset(options.sources, rnd)
        File "./compilebench", line 319, in __init__
          self.unpatched = native_order(self.unpatched, "unpatched")
        File "./compilebench", line 97, in native_order
          run_directory(tmplist, dirname, "native %s" % tag)
        File "./compilebench", line 225, in run_directory
          fp = file(fname, 'a+')
      IOError: [Errno 116] Stale NFS file handle: '/mnt/lustre/d0.compilebench/native-0/COPYING'
       parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 
      Dumping lctl log to /home/yujian/test_logs/2011-08-02/231046/parallel-scale.test_compilebench.*.1312351852.log
      

      Syslog on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:

      Aug  2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version  0 1
      Aug  2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version  0 1
      Aug  2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version  0 1
      Aug  2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 1 previous similar message
      Aug  2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version  0 1
      Aug  2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 5 previous similar messages
      

      Debug log on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:

      00000080:00000001:2:1312351850.904050:0:6705:0:(llite_lib.c:2286:ll_prep_inode()) Process entered
      00000002:00000001:2:1312351850.904052:0:6705:0:(mdc_request.c:513:mdc_req2lustre_md()) Process entered
      00000002:00000001:2:1312351850.904055:0:6705:0:(mdc_request.c:581:mdc_req2lustre_md()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:2:1312351850.904058:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001)
      00000080:00000001:2:1312351850.904061:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001)
      00000080:00000001:2:1312351850.904065:0:6705:0:(namei.c:165:ll_iget()) Process entered
      00000080:00000001:2:1312351850.904089:0:6705:0:(llite_lib.c:1900:ll_update_inode()) Process entered
      00000080:00000002:2:1312351850.904091:0:6705:0:(llite_lib.c:1902:ll_update_inode()) body->valid = 0x16fbf
      00000080:00000001:2:1312351850.904093:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001)
      00000080:00000001:2:1312351850.904098:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001)
      00000080:00000001:2:1312351850.904101:0:6705:0:(namei.c:130:ll_fid_build_gen()) Process entered
      00000080:00000001:2:1312351850.904103:0:6705:0:(namei.c:135:ll_fid_build_gen()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:2:1312351850.904106:0:6705:0:(llite_lib.c:2028:ll_update_inode()) Process leaving
      00000080:00200000:2:1312351850.904108:0:6705:0:(namei.c:180:ll_iget()) got inode: 144115205255725057/0(ffff8800377cc190) for [0x200000400:0x1:0x0]
      00000080:00000001:2:1312351850.904112:0:6705:0:(namei.c:183:ll_iget()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190)
      00000080:00000001:2:1312351850.904116:0:6705:0:(obd_class.h:539:obd_checkmd()) Process entered
      00020000:00000001:2:1312351850.904118:0:6705:0:(lov_obd.c:3139:lov_checkmd()) Process entered
      00020000:00000001:2:1312351850.904119:0:6705:0:(lov_obd.c:3142:lov_checkmd()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:2:1312351850.904122:0:6705:0:(obd_class.h:545:obd_checkmd()) Process leaving (rc=0 : 0 : 0)
      00000080:00000001:2:1312351850.904124:0:6705:0:(llite_lib.c:2314:ll_prep_inode()) Process leaving (rc=0 : 0 : 0)
      00000100:00000001:2:1312351850.904127:0:6705:0:(client.c:1806:__ptlrpc_req_finished()) Process entered
      00000100:00000040:2:1312351850.904129:0:6705:0:(client.c:1818:__ptlrpc_req_finished()) @@@ refcount now 0  req@ffff880118ebcc00 x1376100389224929/t0 o33->lustre-MDT0000_UUID@192.168.4.132@o2ib:12/10 lens 408/808 e 0 to 1 dl 1312351857 ref 1 fl Interpret:R/0/0 rc 0/0
      00000100:00000001:2:1312351850.904141:0:6705:0:(client.c:1739:__ptlrpc_free_req()) Process entered
      00000100:00000010:2:1312351850.904143:0:6705:0:(client.c:1770:__ptlrpc_free_req()) kfreed 'request->rq_repbuf': 808 at ffff880118f43400.
      00000100:00000040:2:1312351850.904147:0:6705:0:(client.c:1775:__ptlrpc_free_req()) PUTting export ffff8801006dc200 : new refcount 4
      00000020:00000001:2:1312351850.904150:0:6705:0:(genops.c:816:class_import_put()) Process entered
      00000020:00000040:2:1312351850.904152:0:6705:0:(genops.c:824:class_import_put()) import ffff88021a3b5800 refcount=5 obd=lustre-MDT0000-mdc-ffff880106982400
      00000020:00000001:2:1312351850.904155:0:6705:0:(genops.c:835:class_import_put()) Process leaving
      00000100:00000010:2:1312351850.904156:0:6705:0:(client.c:1789:__ptlrpc_free_req()) kfreed 'request->rq_reqmsg': 408 at ffff88011853e200.
      00000100:00000010:2:1312351850.904159:0:6705:0:(client.c:1792:__ptlrpc_free_req()) kfreed 'request': 584 at ffff880118ebcc00.
      00000100:00000001:2:1312351850.904162:0:6705:0:(client.c:1794:__ptlrpc_free_req()) Process leaving
      00000100:00000001:2:1312351850.904164:0:6705:0:(client.c:1822:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1)
      00000080:00000001:2:1312351850.904167:0:6705:0:(llite_nfs.c:107:search_inode_for_lustre()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190)
      00000080:00020000:2:1312351850.904170:0:6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version  0 1
      00000080:00000001:2:1312351850.904174:0:6705:0:(llite_nfs.c:132:ll_iget_for_nfs()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
      00000080:00000001:2:1312351850.904178:0:6705:0:(llite_nfs.c:190:ll_fh_to_dentry()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
      

      Maloo report: https://maloo.whamcloud.com/test_sets/0ec3ada0-bd9f-11e0-8bdf-52540025f9af

      The issue also existed on NFSv4 over Lustre client testing: https://maloo.whamcloud.com/test_sets/91dd2132-bda2-11e0-8bdf-52540025f9af

      It could be easily reproduced by just touching a file under the NFS client directory:

      [root@client-12-ib ~]# df -T /mnt/lustre
      Filesystem    Type   1K-blocks      Used Available Use% Mounted on
      fat-amd-3-ib:/
                    nfs4    11811840    615424  10596352   6% /mnt/lustre
      [root@client-12-ib ~]# touch /mnt/lustre/aaa
      touch: cannot touch `/mnt/lustre/aaa': Stale NFS file handle
      

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: