Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
NFS v3/v4 clients:
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Network: IB (inkernel OFED)
Nodes: client-12-ib, client-13-ib
Lustre Client and NFS v3/v4 Server:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
Node: fat-amd-3-ib
Lustre Servers:
Tag: v2_0_66_0
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
Nodes: fat-amd-1-ib (MDS), fat-amd-2-ib (OSS)
NFS v3/v4 clients: Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6) Network: IB (inkernel OFED) Nodes: client-12-ib, client-13-ib Lustre Client and NFS v3/v4 Server: Tag: 1.8.6-wc1 Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6) Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/ Network: IB (inkernel OFED) Node: fat-amd-3-ib Lustre Servers: Tag: v2_0_66_0 Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre) Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/ Network: IB (inkernel OFED) Nodes: fat-amd-1-ib (MDS), fat-amd-2-ib (OSS)
-
3
-
5186
Description
While running compilebench on NFSv3 client (NFSv3 server was Lustre 1.8.6-wc1 client, and the Lustre server version was v2_0_66_0), it failed as follows:
NFSCLIENT mode: setup, cleanup, check config skipped client-13-ib fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0 client-12-ib fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0 fat-amd-3-ib Filesystem Type 1K-blocks Used Available Use% Mounted on fat-amd-1-ib@o2ib:/lustre lustre 11811168 615432 10595352 6% /mnt/lustre == test compilebench: compilebench == 23:10:50 OPTIONS: cbench_DIR=/usr/bin cbench_IDIRS=10 cbench_RUNS=10 client-12-ib client-13-ib ./compilebench -D /mnt/lustre/d0.compilebench -i 10 -r 10 --makej using working directory /mnt/lustre/d0.compilebench, 10 intial dirs 10 runs Traceback (most recent call last): File "./compilebench", line 567, in <module> dset = dataset(options.sources, rnd) File "./compilebench", line 319, in __init__ self.unpatched = native_order(self.unpatched, "unpatched") File "./compilebench", line 97, in native_order run_directory(tmplist, dirname, "native %s" % tag) File "./compilebench", line 225, in run_directory fp = file(fname, 'a+') IOError: [Errno 116] Stale NFS file handle: '/mnt/lustre/d0.compilebench/native-0/COPYING' parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Dumping lctl log to /home/yujian/test_logs/2011-08-02/231046/parallel-scale.test_compilebench.*.1312351852.log
Syslog on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:
Aug 2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version 0 1 Aug 2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version 0 1 Aug 2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 Aug 2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 1 previous similar message Aug 2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 Aug 2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 5 previous similar messages
Debug log on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:
00000080:00000001:2:1312351850.904050:0:6705:0:(llite_lib.c:2286:ll_prep_inode()) Process entered 00000002:00000001:2:1312351850.904052:0:6705:0:(mdc_request.c:513:mdc_req2lustre_md()) Process entered 00000002:00000001:2:1312351850.904055:0:6705:0:(mdc_request.c:581:mdc_req2lustre_md()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904058:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904061:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904065:0:6705:0:(namei.c:165:ll_iget()) Process entered 00000080:00000001:2:1312351850.904089:0:6705:0:(llite_lib.c:1900:ll_update_inode()) Process entered 00000080:00000002:2:1312351850.904091:0:6705:0:(llite_lib.c:1902:ll_update_inode()) body->valid = 0x16fbf 00000080:00000001:2:1312351850.904093:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904098:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904101:0:6705:0:(namei.c:130:ll_fid_build_gen()) Process entered 00000080:00000001:2:1312351850.904103:0:6705:0:(namei.c:135:ll_fid_build_gen()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904106:0:6705:0:(llite_lib.c:2028:ll_update_inode()) Process leaving 00000080:00200000:2:1312351850.904108:0:6705:0:(namei.c:180:ll_iget()) got inode: 144115205255725057/0(ffff8800377cc190) for [0x200000400:0x1:0x0] 00000080:00000001:2:1312351850.904112:0:6705:0:(namei.c:183:ll_iget()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190) 00000080:00000001:2:1312351850.904116:0:6705:0:(obd_class.h:539:obd_checkmd()) Process entered 00020000:00000001:2:1312351850.904118:0:6705:0:(lov_obd.c:3139:lov_checkmd()) Process entered 00020000:00000001:2:1312351850.904119:0:6705:0:(lov_obd.c:3142:lov_checkmd()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904122:0:6705:0:(obd_class.h:545:obd_checkmd()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904124:0:6705:0:(llite_lib.c:2314:ll_prep_inode()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:2:1312351850.904127:0:6705:0:(client.c:1806:__ptlrpc_req_finished()) Process entered 00000100:00000040:2:1312351850.904129:0:6705:0:(client.c:1818:__ptlrpc_req_finished()) @@@ refcount now 0 req@ffff880118ebcc00 x1376100389224929/t0 o33->lustre-MDT0000_UUID@192.168.4.132@o2ib:12/10 lens 408/808 e 0 to 1 dl 1312351857 ref 1 fl Interpret:R/0/0 rc 0/0 00000100:00000001:2:1312351850.904141:0:6705:0:(client.c:1739:__ptlrpc_free_req()) Process entered 00000100:00000010:2:1312351850.904143:0:6705:0:(client.c:1770:__ptlrpc_free_req()) kfreed 'request->rq_repbuf': 808 at ffff880118f43400. 00000100:00000040:2:1312351850.904147:0:6705:0:(client.c:1775:__ptlrpc_free_req()) PUTting export ffff8801006dc200 : new refcount 4 00000020:00000001:2:1312351850.904150:0:6705:0:(genops.c:816:class_import_put()) Process entered 00000020:00000040:2:1312351850.904152:0:6705:0:(genops.c:824:class_import_put()) import ffff88021a3b5800 refcount=5 obd=lustre-MDT0000-mdc-ffff880106982400 00000020:00000001:2:1312351850.904155:0:6705:0:(genops.c:835:class_import_put()) Process leaving 00000100:00000010:2:1312351850.904156:0:6705:0:(client.c:1789:__ptlrpc_free_req()) kfreed 'request->rq_reqmsg': 408 at ffff88011853e200. 00000100:00000010:2:1312351850.904159:0:6705:0:(client.c:1792:__ptlrpc_free_req()) kfreed 'request': 584 at ffff880118ebcc00. 00000100:00000001:2:1312351850.904162:0:6705:0:(client.c:1794:__ptlrpc_free_req()) Process leaving 00000100:00000001:2:1312351850.904164:0:6705:0:(client.c:1822:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1) 00000080:00000001:2:1312351850.904167:0:6705:0:(llite_nfs.c:107:search_inode_for_lustre()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190) 00000080:00020000:2:1312351850.904170:0:6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 00000080:00000001:2:1312351850.904174:0:6705:0:(llite_nfs.c:132:ll_iget_for_nfs()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000080:00000001:2:1312351850.904178:0:6705:0:(llite_nfs.c:190:ll_fh_to_dentry()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
Maloo report: https://maloo.whamcloud.com/test_sets/0ec3ada0-bd9f-11e0-8bdf-52540025f9af
The issue also existed on NFSv4 over Lustre client testing: https://maloo.whamcloud.com/test_sets/91dd2132-bda2-11e0-8bdf-52540025f9af
It could be easily reproduced by just touching a file under the NFS client directory:
[root@client-12-ib ~]# df -T /mnt/lustre Filesystem Type 1K-blocks Used Available Use% Mounted on fat-amd-3-ib:/ nfs4 11811840 615424 10596352 6% /mnt/lustre [root@client-12-ib ~]# touch /mnt/lustre/aaa touch: cannot touch `/mnt/lustre/aaa': Stale NFS file handle
Attachments
Issue Links
- Trackbacks
-
Changelog 1.8 {}version 1.8.7wc1{} {}Support for networks: socklnd \any kernel supported by Lustre, qswlnd Qsnet kernel modules 5.20 and later, openiblnd IbGold 1.8.2, o2iblnd OFED 1.3, 1.4.1, 1.4.2, 1.5.1, 1.5.2, 1.5.3.1 and 1.5.3.2 gmlnd GM 2.1....