[LU-1436] Test failure on test suite sanity, subtest test_24w Created: 24/May/12  Updated: 13/Feb/14  Resolved: 02/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-1437 Test failure on test suite sanity, su... Resolved
Related
is related to LU-969 2.1 client stack overruns Resolved
is related to LU-4357 page allocation failure. mode:0x40 ca... Resolved
Severity: 3
Rank (Obsolete): 4518

 Description   

This issue was created by maloo for yujian <yujian@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/78395658-a4b3-11e1-adce-52540035b04c.

The sub-test test_24w failed with the following error:

== sanity test 24w: Reading a file larger than 4Gb =================================================== 14:31:53 (1337635913)
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0721251 seconds, 14.5 MB/s
1+0 records in
1+0 records out
234852 bytes (235 kB) copied, 0.0710902 seconds, 3.3 MB/s
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00325134 seconds, 0.0 kB/s
sanity test_24w: @@@@@@ FAIL: Error reading at the end of the file f24w

Info required for matching: sanity 24w



 Comments   
Comment by Jian Yu [ 24/May/12 ]

This is a regression on b2_1 branch.

Comment by Andreas Dilger [ 24/May/12 ]

I would say that this is a blocker for the 2.1.2 release, and is likely caused by the same problem as LU-1437.

Can you check some of the earlier builds to see when it was introduced? Does it fail repeatedly, or is it intermittent?

Comment by Oleg Drokin [ 24/May/12 ]

The problem here was introduced by LU-969 http://review.whamcloud.com/2668

+long libcfs_log_return(struct libcfs_debug_msg_data *, long rc);

This truncates all return values passed through RETURN() macro to long, which is only 32 bit on i686.

Comment by Jian Yu [ 24/May/12 ]

The problem here was introduced by LU-969 http://review.whamcloud.com/2668

Yes. After it was committed on 11 May, the sanity tests 24w,34

{c,d,g}

, started failing on i686 client. Here are the search results of the historical Maloo reports:

test 24w: http://tinyurl.com/cf9uvj2
test 34c: http://tinyurl.com/clzdw2k

Comment by Andreas Dilger [ 25/May/12 ]

Unfortunately, it was my proposal that caused this problem. I'm surprised that it only caused so few issues during testing.

The patch is fairly straight forward - change libcfs_log_return() to return long long. Even with sign extension, the original value should be OK after casting the value down to the original type again.

That must mean the same problem also exists for 32-bit clients on master. Are we no longer doing any 32-bit testing on master? I thought we would drop i686 server and RHEL5 server build/test, but I thought we would keep RHEL5/i686 client testing going to ensure we have at least some 32-bit test coverage.

Comment by Peter Jones [ 31/May/12 ]

Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Andreas Dilger [ 31/May/12 ]

Marking for a blocker on 2.1.2 as well.

Comment by Hongchao Zhang [ 11/Jun/12 ]

the patch is tracked at http://review.whamcloud.com/#change,3072

Comment by Alex Zhuravlev [ 03/Jul/12 ]

could someone check lustre builds with --disable-liblustre please ?

Comment by Hongchao Zhang [ 27/Jul/12 ]

status update:
the tests against the patch has passed and is still under inspection(currently one +)

Comment by Peter Jones [ 02/Aug/12 ]

Landed for 2.3

Generated at Sat Feb 10 01:16:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.