Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
NFS v3/v4 clients:
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Network: IB (inkernel OFED)
Nodes: client-12-ib, client-13-ib
Lustre Client and NFS v3/v4 Server:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
Node: fat-amd-3-ib
Lustre Servers:
Tag: v2_0_66_0
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: IB (inkernel OFED)
Nodes: fat-amd-1-ib (MDS), fat-amd-2-ib (OSS)
NFS v3/v4 clients: Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6) Network: IB (inkernel OFED) Nodes: client-12-ib, client-13-ib Lustre Client and NFS v3/v4 Server: Tag: 1.8.6-wc1 Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6) Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/ Network: IB (inkernel OFED) Node: fat-amd-3-ib Lustre Servers: Tag: v2_0_66_0 Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.2.1.el6_lustre) Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/ Network: IB (inkernel OFED) Nodes: fat-amd-1-ib (MDS), fat-amd-2-ib (OSS)
-
3
-
5186
Description
While running compilebench on NFSv3 client (NFSv3 server was Lustre 1.8.6-wc1 client, and the Lustre server version was v2_0_66_0), it failed as follows:
NFSCLIENT mode: setup, cleanup, check config skipped client-13-ib fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0 client-12-ib fat-amd-3-ib:/mnt/lustre /mnt/lustre nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.134,mountvers=3,mountport=59661,mountproto=udp,local_lock=none,addr=192.168.4.134 0 0 fat-amd-3-ib Filesystem Type 1K-blocks Used Available Use% Mounted on fat-amd-1-ib@o2ib:/lustre lustre 11811168 615432 10595352 6% /mnt/lustre == test compilebench: compilebench == 23:10:50 OPTIONS: cbench_DIR=/usr/bin cbench_IDIRS=10 cbench_RUNS=10 client-12-ib client-13-ib ./compilebench -D /mnt/lustre/d0.compilebench -i 10 -r 10 --makej using working directory /mnt/lustre/d0.compilebench, 10 intial dirs 10 runs Traceback (most recent call last): File "./compilebench", line 567, in <module> dset = dataset(options.sources, rnd) File "./compilebench", line 319, in __init__ self.unpatched = native_order(self.unpatched, "unpatched") File "./compilebench", line 97, in native_order run_directory(tmplist, dirname, "native %s" % tag) File "./compilebench", line 225, in run_directory fp = file(fname, 'a+') IOError: [Errno 116] Stale NFS file handle: '/mnt/lustre/d0.compilebench/native-0/COPYING' parallel-scale test_compilebench: @@@@@@ FAIL: compilebench failed: 1 Dumping lctl log to /home/yujian/test_logs/2011-08-02/231046/parallel-scale.test_compilebench.*.1312351852.log
Syslog on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:
Aug 2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version 0 1 Aug 2 23:10:50 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 1 or version 0 1 Aug 2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 Aug 2 23:10:52 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 1 previous similar message Aug 2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 Aug 2 23:11:06 fat-amd-3 kernel: LustreError: 6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Skipped 5 previous similar messages
Debug log on the Lustre client (NFSv3 server) fat-amd-3-ib showed that:
00000080:00000001:2:1312351850.904050:0:6705:0:(llite_lib.c:2286:ll_prep_inode()) Process entered 00000002:00000001:2:1312351850.904052:0:6705:0:(mdc_request.c:513:mdc_req2lustre_md()) Process entered 00000002:00000001:2:1312351850.904055:0:6705:0:(mdc_request.c:581:mdc_req2lustre_md()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904058:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904061:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904065:0:6705:0:(namei.c:165:ll_iget()) Process entered 00000080:00000001:2:1312351850.904089:0:6705:0:(llite_lib.c:1900:ll_update_inode()) Process entered 00000080:00000002:2:1312351850.904091:0:6705:0:(llite_lib.c:1902:ll_update_inode()) body->valid = 0x16fbf 00000080:00000001:2:1312351850.904093:0:6705:0:(namei.c:88:fid_flatten()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904098:0:6705:0:(namei.c:124:ll_fid_build_ino()) Process leaving (rc=144115205255725057 : 144115205255725057 : 200000400000001) 00000080:00000001:2:1312351850.904101:0:6705:0:(namei.c:130:ll_fid_build_gen()) Process entered 00000080:00000001:2:1312351850.904103:0:6705:0:(namei.c:135:ll_fid_build_gen()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904106:0:6705:0:(llite_lib.c:2028:ll_update_inode()) Process leaving 00000080:00200000:2:1312351850.904108:0:6705:0:(namei.c:180:ll_iget()) got inode: 144115205255725057/0(ffff8800377cc190) for [0x200000400:0x1:0x0] 00000080:00000001:2:1312351850.904112:0:6705:0:(namei.c:183:ll_iget()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190) 00000080:00000001:2:1312351850.904116:0:6705:0:(obd_class.h:539:obd_checkmd()) Process entered 00020000:00000001:2:1312351850.904118:0:6705:0:(lov_obd.c:3139:lov_checkmd()) Process entered 00020000:00000001:2:1312351850.904119:0:6705:0:(lov_obd.c:3142:lov_checkmd()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904122:0:6705:0:(obd_class.h:545:obd_checkmd()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:2:1312351850.904124:0:6705:0:(llite_lib.c:2314:ll_prep_inode()) Process leaving (rc=0 : 0 : 0) 00000100:00000001:2:1312351850.904127:0:6705:0:(client.c:1806:__ptlrpc_req_finished()) Process entered 00000100:00000040:2:1312351850.904129:0:6705:0:(client.c:1818:__ptlrpc_req_finished()) @@@ refcount now 0 req@ffff880118ebcc00 x1376100389224929/t0 o33->lustre-MDT0000_UUID@192.168.4.132@o2ib:12/10 lens 408/808 e 0 to 1 dl 1312351857 ref 1 fl Interpret:R/0/0 rc 0/0 00000100:00000001:2:1312351850.904141:0:6705:0:(client.c:1739:__ptlrpc_free_req()) Process entered 00000100:00000010:2:1312351850.904143:0:6705:0:(client.c:1770:__ptlrpc_free_req()) kfreed 'request->rq_repbuf': 808 at ffff880118f43400. 00000100:00000040:2:1312351850.904147:0:6705:0:(client.c:1775:__ptlrpc_free_req()) PUTting export ffff8801006dc200 : new refcount 4 00000020:00000001:2:1312351850.904150:0:6705:0:(genops.c:816:class_import_put()) Process entered 00000020:00000040:2:1312351850.904152:0:6705:0:(genops.c:824:class_import_put()) import ffff88021a3b5800 refcount=5 obd=lustre-MDT0000-mdc-ffff880106982400 00000020:00000001:2:1312351850.904155:0:6705:0:(genops.c:835:class_import_put()) Process leaving 00000100:00000010:2:1312351850.904156:0:6705:0:(client.c:1789:__ptlrpc_free_req()) kfreed 'request->rq_reqmsg': 408 at ffff88011853e200. 00000100:00000010:2:1312351850.904159:0:6705:0:(client.c:1792:__ptlrpc_free_req()) kfreed 'request': 584 at ffff880118ebcc00. 00000100:00000001:2:1312351850.904162:0:6705:0:(client.c:1794:__ptlrpc_free_req()) Process leaving 00000100:00000001:2:1312351850.904164:0:6705:0:(client.c:1822:__ptlrpc_req_finished()) Process leaving (rc=1 : 1 : 1) 00000080:00000001:2:1312351850.904167:0:6705:0:(llite_nfs.c:107:search_inode_for_lustre()) Process leaving (rc=18446612133245141392 : -131940464410224 : ffff8800377cc190) 00000080:00020000:2:1312351850.904170:0:6705:0:(llite_nfs.c:130:ll_iget_for_nfs()) Inode 144115205255725057, Bad count: 2 2 or version 0 1 00000080:00000001:2:1312351850.904174:0:6705:0:(llite_nfs.c:132:ll_iget_for_nfs()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c) 00000080:00000001:2:1312351850.904178:0:6705:0:(llite_nfs.c:190:ll_fh_to_dentry()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
Maloo report: https://maloo.whamcloud.com/test_sets/0ec3ada0-bd9f-11e0-8bdf-52540025f9af
The issue also existed on NFSv4 over Lustre client testing: https://maloo.whamcloud.com/test_sets/91dd2132-bda2-11e0-8bdf-52540025f9af
It could be easily reproduced by just touching a file under the NFS client directory:
[root@client-12-ib ~]# df -T /mnt/lustre Filesystem Type 1K-blocks Used Available Use% Mounted on fat-amd-3-ib:/ nfs4 11811840 615424 10596352 6% /mnt/lustre [root@client-12-ib ~]# touch /mnt/lustre/aaa touch: cannot touch `/mnt/lustre/aaa': Stale NFS file handle
Attachments
Issue Links
- Trackbacks
-
Changelog 1.8 {}version 1.8.7wc1{} {}Support for networks: socklnd \any kernel supported by Lustre, qswlnd Qsnet kernel modules 5.20 and later, openiblnd IbGold 1.8.2, o2iblnd OFED 1.3, 1.4.1, 1.4.2, 1.5.1, 1.5.2, 1.5.3.1 and 1.5.3.2 gmlnd GM 2.1....
Integrated in
lustre-b1_8 » i686,server,el5,ofa #121
LU-565Update llite_nfc.c to 2.x interop fid APIOleg Drokin : 999530eeff5849183d8849688fd0f54dbd62c0f3
Files :