[LU-8601] sanity test_230d: Timeout on ZFS backed MDSs Created: 12/Sep/16  Updated: 03/Oct/18  Resolved: 03/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 0
Labels: zfs

Issue Links:
Duplicate
is duplicated by LU-11235 sanity: test_230d timeout on ZFS back... Open
Related
is related to LU-9247 replay-ost-single test_5: test failed... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/7a155540-7613-11e6-b08e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/d94ac9c8-75ff-11e6-8a8c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f347090a-7630-11e6-8a8c-5254006e85c2

The sub-test test_230d failed with the following error:

test failed to respond and timed out

Please provide additional information about the failure here.

Info required for matching: sanity 230d



 Comments   
Comment by Nathaniel Clark [ 09/Dec/16 ]

master zfs
https://testing.hpdd.intel.com/test_sets/05f5e338-bcf8-11e6-89a8-5254006e85c2

Comment by Nathaniel Clark [ 09/Dec/16 ]

Even when this test passes on ZFS it is remarkably slow ~1400sec (2 MDS, with 2 MDT each)

Single MDS with 4 MDT it times out.

Comment by Joseph Gmitter (Inactive) [ 05/Apr/17 ]

Hi Alex,

Can you look into this issue? It also may be the same as LU-9247.

Thanks.
Joe

Comment by Alex Zhuravlev [ 05/Apr/17 ]

migration on ZFS should be very very small as it (migration) involves synchronous I/O which is extremely slow on ZFS w/o ZIL.
I'd suggest to disable this test with ZFS.

Comment by Joseph Gmitter (Inactive) [ 05/Apr/17 ]

Thanks Alex, that direction is very helpful. Do you think the same case is true for LU-9247?

Comment by James Nunez (Inactive) [ 18/Apr/17 ]

sanity 230d has timed out three times this year only when running with ZFS and DNE. This testing is done for tagged builds of master.

This is only happening with DNE, do we still think that this is just ZFS slowness?

For 2017, here are the only cases where 230d times out:
2.9.52 - https://testing.hpdd.intel.com/test_sets/f4f9d750-efe8-11e6-8c0d-5254006e85c2
2.9.54 - https://testing.hpdd.intel.com/test_sets/6e01cc88-0c1e-11e7-8c9f-5254006e85c2
2.9.55 - https://testing.hpdd.intel.com/test_sets/c4a941a4-1e09-11e7-b742-5254006e85c2

Comment by Sarah Liu [ 07/Jun/17 ]

another one on DNE+ZFS 2.9.58

https://testing.hpdd.intel.com/test_sets/3b4e3b94-44b5-11e7-b3fe-5254006e85c2

Comment by Patrick Farrell (Inactive) [ 07/Feb/18 ]

Another:
https://testing.hpdd.intel.com/test_sessions/004b644b-af97-429d-954c-65316b8f7a96

Comment by Mikhail Pershin [ 12/Feb/18 ]

+1
https://testing.hpdd.intel.com/test_sets/8d897e16-0f33-11e8-a6ad-52540065bddc

Comment by Bob Glossman (Inactive) [ 24/May/18 ]

another on b2_10:
https://testing.hpdd.intel.com/test_sets/0db2bb16-5f93-11e8-b303-52540065bddc

Comment by Andreas Dilger [ 03/Oct/18 ]

Close as a duplicate of LU-11235.

Generated at Sat Feb 10 02:18:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.