[LU-9661] sanity test_57b: MDC before 382704 != after 382680 Created: 14/Jun/17  Updated: 14/Jul/21  Resolved: 26/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Jian Yu
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Servers - 2.9.58, 3591, master
Client - 2.9.0, build 22, b2_9


Issue Links:
Duplicate
duplicates LU-9677 Checking OST counts in test suites wh... Open
Related
is related to LU-4109 sanity test_57b failure: 'MDC before ... Resolved
is related to LU-9677 Checking OST counts in test suites wh... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sessions/f37f95be-4af5-11e7-b400-5254006e85c2.

The sub-test test_57b failed with the following error:

MDC before 382704 != after 382680

test log:

== sanity test 57b: default LOV EAs are stored inside large inodes =================================== 01:33:33 (1496712813)
mcreating 100 files
total: 100 creates in 0.07 seconds: 1478.94 creates/second
Filesystem            1K-blocks   Used Available Use% Mounted on
10.2.4.49@tcp:/lustre    991128 493192    429096  54% /mnt/lustre
opening files to create objects/EAs
Filesystem            1K-blocks   Used Available Use% Mounted on
10.2.4.49@tcp:/lustre    991128 493192    429096  54% /mnt/lustre
 sanity test_57b: @@@@@@ FAIL: MDC before 382704 != after 382680 

This issue is seen only during rolling upgrade/downgrade testing. Checked results for interop testing but this issue didn't occured.

Other results with same issue.
https://testing.hpdd.intel.com/test_sessions/6729ebe2-4af9-11e7-91f4-5254006e85c2
https://testing.hpdd.intel.com/sub_tests/ebfa335c-4b05-11e7-bc6c-5254006e85c2



 Comments   
Comment by Peter Jones [ 15/Jun/17 ]

Jian

can you please investigate?

Thanks

Peter

Comment by Jian Yu [ 15/Jun/17 ]

Hi Saurabh,
I saw the configuration was 1 MGT/MDT, 1 OST and 1 Client. Did the same test pass before during rolling upgrade testing with the same configuration?
Many tests required at least 2 OSTs. Some of them have codes to check the OST count, and some do not have such codes, which will fail with 1 OST (e.g., sanity test 27F failed with 1 OST) instead of skipping the test.

Comment by Saurabh Tandan (Inactive) [ 15/Jun/17 ]

Yes, the same test did pass before for rolling upgrade/downgrade testing with the same config. Here are the results for the last successful run:
https://testing.hpdd.intel.com/test_sessions/977f2f34-1e29-11e7-9de9-5254006e85c2

Comment by Jian Yu [ 15/Jun/17 ]

Thank you Saurabh for the info. The above passed session used physical server nodes instead of vm nodes and had much larger server target device size. Is there any successful test session also using vm server nodes?
If yes, I'm going to reproduce the failure and find out the root cause.

Comment by Jian Yu [ 15/Jun/17 ]

BTW, could you please run the tests with at least 2 OSTs in the future so as to avoid some configuration test failures?

Comment by Saurabh Tandan (Inactive) [ 15/Jun/17 ]

I don't have them on vms and yes you are right may be that can be the issue. And I will make sure I use 2 OSTs .

Comment by Jian Yu [ 15/Jun/17 ]

Thank you Saurabh.
Could you please re-run the tests with 2 OSTs and larger device size? (many tests in the test session failed with out of space issue)

Comment by Saurabh Tandan (Inactive) [ 15/Jun/17 ]

Yeah sure, I will do that.
Thanks for looking into the issues.

Comment by Sarah Liu [ 16/Jun/17 ]

We should improve the test scripts with checking OST count to avoid this kind of failure. I will open a ticket- LU-9677

Comment by Jian Yu [ 16/Jun/17 ]

Thank you, Sarah.

Comment by Jian Yu [ 26/Mar/18 ]

The test script will be improved in LU-9677.

Generated at Sat Feb 10 02:28:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.