[LU-5889] sanity test 102k failed on sparc Created: 10/Nov/14  Updated: 11/Dec/14  Resolved: 11/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: uemura yoshifumi Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: master
Environment:

Version of Lustre 2.6.0 client and server. sparc t2000 client. RHEL6.5 servers


Severity: 3
Rank (Obsolete): 16463

 Description   

sanity 102k of Lustre testset failed because setfattr retruns EINVAL in the SPARC machine.
This error occurs because lov magic number is mismatched on MDS.

This problem is that struct lov_user_md is accessed directly in mdc_setattr_pack
struct lov_user_md should be accessed by little endian.

Client log is as follows:

== sanity test 102k: setfattr without parameter of value shouldn't cause a crash ===================== 16:34:46 (1396251286)
setfattr: /mnt/test/d102k: Invalid argument
 sanity test_102k: @@@@@@ FAIL: stripe size 65536 != 1048576 
  Trace dump:
  = /root/lustre/tests/test-framework.sh:4466:error_noexit()
  = /root/lustre/tests/test-framework.sh:4497:error()
  = /root/lustre/tests/sanity.sh:6452:test_102k()
  = /root/lustre/tests/test-framework.sh:4743:run_one()
  = /root/lustre/tests/test-framework.sh:4778:run_one_logged()
  = /root/lustre/tests/test-framework.sh:4598:run_test()
  = /root/lustre/tests/sanity.sh:6459:main()
Dumping lctl log to /tmp/test_logs/2014-03-31/163438/sanity.test_102k.*.1396251287.log
Dumping logs only on local client.
FAIL 102k (3s)

Lustre log of MDS is as follows:

00000004:00000001:3.0:1400646965.139436:0:11616:0:(lod_object.c:528:lod_xattr_set()) Process entered
00000004:00000001:3.0:1400646965.139437:0:11616:0:(lod_object.c:477:lod_xattr_set_lov_on_dir()) Process entered
00000004:00000001:3.0:1400646965.139439:0:11616:0:(lod_lov.c:870:lod_verify_striping()) Process entered
00000004:00000080:3.0:1400646965.139441:0:11616:0:(lod_lov.c:887:lod_verify_striping()) bad userland LOV MAGIC: 0xd00bd10b
00000004:00000001:3.0:1400646965.139442:0:11616:0:(lod_lov.c:888:lod_verify_striping()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00000004:00000001:3.0:1400646965.139445:0:11616:0:(lod_lov.c:976:lod_verify_striping()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)

I'll upload the patch soon, so could you please review it?
Thank you



 Comments   
Comment by Jinshan Xiong (Inactive) [ 10/Nov/14 ]

Yes, please upload the patch.

Comment by Oleg Drokin [ 10/Nov/14 ]

We also need to make sure that this is not present in 2.5 release and also if not - what patch introduced the regressio (so that we pick this fix if it's ever backported)

Comment by uemura yoshifumi [ 12/Nov/14 ]

The patch for master
http://review.whamcloud.com/#/c/12683/

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12683/
Subject: LU-5889 mdc: Proper accessing struct lov_user_md
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 84e925a5611028f49c8ad07996352c2f062f598e

Comment by Jodi Levi (Inactive) [ 11/Dec/14 ]

Patch landed to Master.

Generated at Sat Feb 10 01:55:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.