[LU-2770] Interop 2.3.0<->2.4 failure on test suite parallel-scale-nfsv3,test_compilebench Created: 06/Feb/13 Updated: 22/Dec/17 Resolved: 05/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Environment: |
2.3.0 server; 2.4 client |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6712 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6744bb64-6df6-11e2-927e-52540035b04c. The sub-test test_compilebench failed with the following error:
client console: == parallel-scale-nfsv3 test compilebench: compilebench == 00:22:05 (1359879725)
OPTIONS:
cbench_DIR=/usr/bin
cbench_IDIRS=2
cbench_RUNS=2
client-30vm5
client-30vm6.lab.whamcloud.com
./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej
using working directory /mnt/lustre/d0.compilebench, 2 intial dirs 2 runs
Traceback (most recent call last):
File "./compilebench", line 567, in <module>
dset = dataset(options.sources, rnd)
File "./compilebench", line 319, in __init__
self.unpatched = native_order(self.unpatched, "unpatched")
File "./compilebench", line 97, in native_order
run_directory(tmplist, dirname, "native %s" % tag)
File "./compilebench", line 225, in run_directory
fp = file(fname, 'a+')
IOError: [Errno 521] Unknown error 521: '/mnt/lustre/d0.compilebench/native-0/COPYING'
parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1
Trace dump:
MDS dmesg: 00:22:11:Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench == 00:22:05 (1359879725) 00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench -i 2 -r 2 --makej 00:22:11:Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej 00:22:11:LustreError: 28607:0:(mdd_object.c:634:mdd_big_lmm_get()) No buffer to hold trusted.lov xattr of object [0x8000c:0x40dde24f:0x0] 00:22:11:LustreError: 28607:0:(mdt_handler.c:575:mdt_getattr_internal()) getattr error for [0x8000c:0x40dde24f:0x0]: -22 00:22:11:LustreError: 29082:0:(llite_nfs.c:343:ll_get_parent()) failure -22 inode 144118661362221057 get parent 00:22:11:LustreError: 28142:0:(mdd_object.c:634:mdd_big_lmm_get()) No buffer to hold trusted.lov xattr of object [0x8000c:0x40dde24f:0x0] 00:22:11:LustreError: 28142:0:(mdt_handler.c:575:mdt_getattr_internal()) getattr error for [0x8000c:0x40dde24f:0x0]: -22 00:22:11:LustreError: 29082:0:(llite_nfs.c:343:ll_get_parent()) failure -22 inode 144118661362221057 get parent 00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl mark parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1 00:22:11:Lustre: DEBUG MARKER: parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1 00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-02-02/lustre-b2_3-el6-x86_64-vs-lustre-master-el6-x86_64--full--2_6_1__1221__-70344793388480-024131/parallel-scale-nfsv3.test_compilebench.debug_log.$(hostname -s).1359879727.log; 00:22:11: dmesg > /logdir/test_lo 00:22:22:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test metabench: metabench == 00:22:11 \(1359879731\) |
| Comments |
| Comment by Sarah Liu [ 06/Feb/13 ] |
|
Also seen in parallel-scale-nfsv4: https://maloo.whamcloud.com/test_sets/053cdd9c-6df7-11e2-927e-52540035b04c |
| Comment by nasf (Inactive) [ 17/Feb/13 ] |
|
The failure is not related with interoperability. The failure occurred when nfsd thread (Lustre-2.3 client on the Lustre-2.3 MDS node) send MDS_GETATTR_NAME RPC for ll_get_parent(). The current dentry/inode is "d0.compilebench", its parent is the Lustre "ROOT". I am not sure what happened on the "ROOT" object yet. Maybe data crashed. I need more detail log ("-1" level is better) for that. Sarah, would you please to verify the bug on pure Lustre-2.3 with "-1" debug? Thanks! |
| Comment by nasf (Inactive) [ 18/Feb/13 ] |
|
It is the client missing to set ea_size when md_getattr_name for ll_get_parent() caused the failure. The patch for b2_3: The patch for master: |
| Comment by Sarah Liu [ 20/Feb/13 ] |
|
verified with the above two patches: |
| Comment by nasf (Inactive) [ 20/Feb/13 ] |
|
Oleg, do we have plan to land the b2_3 patch? Or just land the master patch? If we do not want to fix the issues on b2_3, then I prefer to mark it as resolved and close it. |
| Comment by Jodi Levi (Inactive) [ 05/Mar/13 ] |
|
Landed to master. |