[LU-2770] Interop 2.3.0<->2.4 failure on test suite parallel-scale-nfsv3,test_compilebench Created: 06/Feb/13  Updated: 22/Dec/17  Resolved: 05/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: MB
Environment:

2.3.0 server; 2.4 client


Issue Links:
Duplicate
is duplicated by LU-3370 Interop 2.3.0<->2.4 failure on test s... Resolved
Severity: 3
Rank (Obsolete): 6712

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/6744bb64-6df6-11e2-927e-52540035b04c.

The sub-test test_compilebench failed with the following error:

compilebench failed: 1

client console:

== parallel-scale-nfsv3 test compilebench: compilebench == 00:22:05 (1359879725)
OPTIONS:
cbench_DIR=/usr/bin
cbench_IDIRS=2
cbench_RUNS=2
client-30vm5
client-30vm6.lab.whamcloud.com
./compilebench -D /mnt/lustre/d0.compilebench -i 2         -r 2 --makej
using working directory /mnt/lustre/d0.compilebench, 2 intial dirs 2 runs
Traceback (most recent call last):
  File "./compilebench", line 567, in <module>
    dset = dataset(options.sources, rnd)
  File "./compilebench", line 319, in __init__
    self.unpatched = native_order(self.unpatched, "unpatched")
  File "./compilebench", line 97, in native_order
    run_directory(tmplist, dirname, "native %s" % tag)
  File "./compilebench", line 225, in run_directory
    fp = file(fname, 'a+')
IOError: [Errno 521] Unknown error 521: '/mnt/lustre/d0.compilebench/native-0/COPYING'
 parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1 
  Trace dump:

MDS dmesg:

00:22:11:Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench == 00:22:05 (1359879725)
00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench -i 2         -r 2 --makej
00:22:11:Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench -i 2 -r 2 --makej
00:22:11:LustreError: 28607:0:(mdd_object.c:634:mdd_big_lmm_get()) No buffer to hold trusted.lov xattr of object [0x8000c:0x40dde24f:0x0]
00:22:11:LustreError: 28607:0:(mdt_handler.c:575:mdt_getattr_internal()) getattr error for [0x8000c:0x40dde24f:0x0]: -22
00:22:11:LustreError: 29082:0:(llite_nfs.c:343:ll_get_parent()) failure -22 inode 144118661362221057 get parent
00:22:11:LustreError: 28142:0:(mdd_object.c:634:mdd_big_lmm_get()) No buffer to hold trusted.lov xattr of object [0x8000c:0x40dde24f:0x0]
00:22:11:LustreError: 28142:0:(mdt_handler.c:575:mdt_getattr_internal()) getattr error for [0x8000c:0x40dde24f:0x0]: -22
00:22:11:LustreError: 29082:0:(llite_nfs.c:343:ll_get_parent()) failure -22 inode 144118661362221057 get parent
00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1 
00:22:11:Lustre: DEBUG MARKER: parallel-scale-nfsv3 test_compilebench: @@@@@@ FAIL: compilebench failed: 1
00:22:11:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-02-02/lustre-b2_3-el6-x86_64-vs-lustre-master-el6-x86_64--full--2_6_1__1221__-70344793388480-024131/parallel-scale-nfsv3.test_compilebench.debug_log.$(hostname -s).1359879727.log;
00:22:11:         dmesg > /logdir/test_lo
00:22:22:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test metabench: metabench == 00:22:11 \(1359879731\)


 Comments   
Comment by Sarah Liu [ 06/Feb/13 ]

Also seen in parallel-scale-nfsv4: https://maloo.whamcloud.com/test_sets/053cdd9c-6df7-11e2-927e-52540035b04c

Comment by nasf (Inactive) [ 17/Feb/13 ]

The failure is not related with interoperability.

The failure occurred when nfsd thread (Lustre-2.3 client on the Lustre-2.3 MDS node) send MDS_GETATTR_NAME RPC for ll_get_parent(). The current dentry/inode is "d0.compilebench", its parent is the Lustre "ROOT". I am not sure what happened on the "ROOT" object yet. Maybe data crashed. I need more detail log ("-1" level is better) for that. Sarah, would you please to verify the bug on pure Lustre-2.3 with "-1" debug? Thanks!

Comment by nasf (Inactive) [ 18/Feb/13 ]

It is the client missing to set ea_size when md_getattr_name for ll_get_parent() caused the failure.

The patch for b2_3:
http://review.whamcloud.com/#change,5454

The patch for master:
http://review.whamcloud.com/#change,5455

Comment by Sarah Liu [ 20/Feb/13 ]

verified with the above two patches:
https://maloo.whamcloud.com/test_sessions/a2509c5c-7b19-11e2-8242-52540035b04c

Comment by nasf (Inactive) [ 20/Feb/13 ]

Oleg, do we have plan to land the b2_3 patch? Or just land the master patch?

If we do not want to fix the issues on b2_3, then I prefer to mark it as resolved and close it.

Comment by Jodi Levi (Inactive) [ 05/Mar/13 ]

Landed to master.

Generated at Sat Feb 10 01:28:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.