[LU-4072] sanity, subtest test_24v takes a VERY LONG TIME on ZFS Created: 07/Oct/13  Updated: 11/Jan/14  Resolved: 04/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: performance, revzfs, zfs

Issue Links:
Related
is related to LU-2887 sanity-quota test_12a: slow due to ZF... Resolved
Severity: 3
Rank (Obsolete): 10926

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/7592b12c-2ccc-11e3-bda0-52540035b04c

The sub-test test_24v failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 24v

This test will eventually complete but can take an inordinate amount of time.



 Comments   
Comment by Nathaniel Clark [ 07/Oct/13 ]

Running this test on my local setup of VMs, MDS is sitting in IOWait for 50% of CPU time. I think this issue may be similar to the ones experienced in LU-2887 during parallel-scale tests.

Comment by Nathaniel Clark [ 07/Oct/13 ]

Reduce number of files created in test by order of magnitude for ZFS
http://review.whamcloud.com/7870

Comment by Nathaniel Clark [ 08/Oct/13 ]

Running sanity/24v on local vm setup. On the MDS, the CPU was 50% iowait for the entire run of the test. System was only 3%.

== sanity test 24v: list directory with large files (handle hash collision, bug: 17560) == 12:24:25 (1381163065)
 - created 10000 (time 1381163076.08 total 10.52 last 10.52)
 - created 20000 (time 1381163092.21 total 26.65 last 16.13)
 - created 30000 (time 1381163957.78 total 892.22 last 865.57)
 - created 40000 (time 1381164943.50 total 1877.94 last 985.72)
 - created 50000 (time 1381166086.74 total 3021.18 last 1143.24)
 - created 60000 (time 1381167417.27 total 4351.71 last 1330.53)
 - created 70000 (time 1381168715.83 total 5650.27 last 1298.56)
 - created 80000 (time 1381169919.18 total 6853.63 last 1203.35)
 - created 90000 (time 1381171269.40 total 8203.85 last 1350.22)
total: 100000 creates in 9826.93 seconds: 10.18 creates/second
mdc.nlczfs1-MDT0000-mdc-ffff88003b3e3000.stats=clear
Waiting for local destroys to complete
Resetting fail_loc on all nodes...done.
PASS 24v (32245s)
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
resend_count is set to 4 4
== sanity test complete, duration 32252 sec == 21:21:51 (1381195311)
Comment by Nathaniel Clark [ 04/Nov/13 ]

Patch landed to master

Comment by Jian Yu [ 08/Jan/14 ]

While validating patches for Lustre b2_5 branch, this failure occurred frequently:
https://maloo.whamcloud.com/test_sets/34bbbc22-76c1-11e3-a9ab-52540035b04c
https://maloo.whamcloud.com/test_sets/040bb508-7754-11e3-a6a3-52540035b04c

Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/8769

Comment by Jian Yu [ 11/Jan/14 ]

Landed for Lustre 2.5.1.

Generated at Sat Feb 10 01:39:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.