[LU-2887] sanity-quota test_12a: slow due to ZFS VMs sharing single disk - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1
Affects Version/s: Lustre 2.4.0, Lustre 2.4.1
Labels:
- MB
- performance
- zfs

Severity:
3
Rank (Obsolete):
6968

Description

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c3a0b364-812d-11e2-b609-52540035b04c.

The sub-test test_12a failed with the following error:

test failed to respond and timed out

Info required for matching: sanity-quota 12a

Looking through test 12a, things seem to have hung up on the runas dd (with oflag=sync) at the end of the test.

OST has threads that are blocked on disk I/O (oss dmesg):

txg_sync      D 0000000000000000     0 24236      2 0x00000080
 ffff88005027dbc0 0000000000000046 ffff88004b906ec0 0000000000000086
 ffff88005027db70 ffff88007c7d4408 0000000000000001 ffff88007c7d4420
 ffff880052101058 ffff88005027dfd8 000000000000fb88 ffff880052101058
Call Trace:
 [<ffffffff81090b9e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa016b5ac>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffffa02d5160>] ? zio_execute+0x0/0xf0 [zfs]
 [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa016b6e3>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa02d533b>] zio_wait+0xeb/0x160 [zfs]
 [<ffffffffa026b807>] dsl_pool_sync+0x2a7/0x480 [zfs]
 [<ffffffffa027e147>] spa_sync+0x397/0x9a0 [zfs]
 [<ffffffffa028fd41>] txg_sync_thread+0x2c1/0x490 [zfs]
 [<ffffffff810527f9>] ? set_user_nice+0xc9/0x130
 [<ffffffffa028fa80>] ? txg_sync_thread+0x0/0x490 [zfs]
 [<ffffffffa0164668>] thread_generic_wrapper+0x68/0x80 [spl]
 [<ffffffffa0164600>] ? thread_generic_wrapper+0x0/0x80 [spl]
 [<ffffffff81090626>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff81090590>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
ll_ost_io00_0 D 0000000000000000     0 18170      2 0x00000080
 ffff8800427b9820 0000000000000046 0000000000000046 0000000000000001
 ffff8800427b98b0 0000000000000086 ffff8800427b97e0 ffff88005027dd60
 ffff8800427b7ab8 ffff8800427b9fd8 000000000000fb88 ffff8800427b7ab8
Call Trace:
 [<ffffffff81090b9e>] ? prepare_to_wait_exclusive+0x4e/0x80
 [<ffffffffa016b5ac>] cv_wait_common+0x9c/0x1a0 [spl]
 [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa016b6e3>] __cv_wait+0x13/0x20 [spl]
 [<ffffffffa028f573>] txg_wait_synced+0xb3/0x190 [zfs]
 [<ffffffffa0c71015>] osd_trans_stop+0x365/0x420 [osd_zfs]
 [<ffffffffa0cb9062>] ofd_trans_stop+0x22/0x60 [ofd]
 [<ffffffffa0cbdf06>] ofd_commitrw_write+0x406/0x11b0 [ofd]
 [<ffffffffa0cbf13d>] ofd_commitrw+0x48d/0x920 [ofd]
 [<ffffffffa085b708>] obd_commitrw+0x128/0x3d0 [ost]
 [<ffffffffa0862599>] ost_brw_write+0xe49/0x14d0 [ost]
 [<ffffffff812739b6>] ? vsnprintf+0x2b6/0x5f0
 [<ffffffffa088c1f0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [<ffffffffa08680e3>] ost_handle+0x31e3/0x46f0 [ost]
 [<ffffffffa05ca154>] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [<ffffffffa08dc02c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
 [<ffffffffa05be5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa08d3759>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81052223>] ? __wake_up+0x53/0x70
 [<ffffffffa08dd576>] ptlrpc_main+0xb76/0x1870 [ptlrpc]
 [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

txgs.1380193340.log
55 kB
26/Sep/13 11:30 AM
txgs.1380193506.log
55 kB
26/Sep/13 11:30 AM
txgs.1380194292.log
55 kB
26/Sep/13 11:30 AM

Issue Links

is blocked by

LU-4009 Add ZIL support to osd-zfs

Open

is duplicated by

LU-4108 Failure on test suite performance-sanity test_4

Resolved

is related to

LU-2829 Timeout on sanityn test_33a: zfs slow when commit_on_sharing enabled

Resolved

LU-2874 Test timeout failure on test suite replay-ost-single test_8a: timeout on wait for dd

Resolved

LU-3109 ZFS - very slow reads, OST watchdogs

Resolved

LU-2836 Test failure on test suite sanity-quota, subtest test_3

Resolved

LU-5148 OSTs won't mount following upgrade to 2.4.2

Resolved

LU-2891 Test failure on test suite sanity-quota, subtest test_0: slow zfs dd

Resolved

LU-2955 replay-ost-single 8b: Hung until Autotest timed out

Resolved

LU-4072 sanity, subtest test_24v takes a VERY LONG TIME on ZFS

Resolved

LU-4444 conf-sanity test_69: ZFS took too long to create 100k files

Resolved

LU-2124 Test failure on test suite obdfilter-survey, subtest test_1a

Closed

LU-4950 sanity-benchmark test fsx hung: txg_sync was stuck on OSS

Closed

is related to

LU-2176 ZFS: running racer grounds everything to a standstill

Resolved

LU-2872 Test timeout failure on test suite sanity-quota test_1

Resolved

LU-3089 Test failure recovery-small test_55: dd should be finished!

Resolved

LU-3225 Timeout failure on test suite sanity-quota, subtest test_19

Resolved

LU-2085 sanityn test_16 (fsx) ran over its Autotest time

Closed

LU-1960 Test failure on test suite sanity-benchmark, subtest test_bonnie

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...

(8 is related to, 6 is related to , 4 mentioned in)

Activity

[LU-2887] sanity-quota test_12a: slow due to ZFS VMs sharing single disk

Jian Yu added a comment - 26/Nov/13 8:04 AM

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/
Distro/Arch: RHEL6.4/x86_64

FSTYPE=zfs
MDSCOUNT=1
MDSSIZE=2097152
OSTCOUNT=2
OSTSIZE=2097152

replay-ost-single test 5 still timed out: https://maloo.whamcloud.com/test_sets/a76a6b78-5606-11e3-8e94-52540035b04c

txg_sync      D 0000000000000000     0 32309      2 0x00000080
 ffff88004925dba0 0000000000000046 ffff8800ffffffff 00001e6701abaaf4
 000000005cd72ac8 ffff88007126d2f0 00000000001ec6e4 ffffffffad6d9703
 ffff880072f07ab8 ffff88004925dfd8 000000000000fb88 ffff880072f07ab8
Call Trace:
 [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
 [<ffffffff8150ed93>] io_schedule+0x73/0xc0
 [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl]
 [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa03e3717>] ? taskq_dispatch_ent+0x57/0x110 [spl]
 [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl]
 [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs]
 [<ffffffffa04c1f2c>] dsl_pool_sync+0xec/0x540 [zfs]
 [<ffffffffa04da82e>] spa_sync+0x39e/0x970 [zfs]
 [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10
 [<ffffffffa04e582a>] txg_sync_thread+0x27a/0x4b0 [zfs]
 [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
 [<ffffffffa04e55b0>] ? txg_sync_thread+0x0/0x4b0 [zfs]
 [<ffffffffa03e2a3f>] thread_generic_wrapper+0x5f/0x70 [spl]
 [<ffffffffa03e29e0>] ? thread_generic_wrapper+0x0/0x70 [spl]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Jian Yu added a comment - 26/Nov/13 8:04 AM Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/ Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=2097152 replay-ost-single test 5 still timed out: https://maloo.whamcloud.com/test_sets/a76a6b78-5606-11e3-8e94-52540035b04c txg_sync D 0000000000000000 0 32309 2 0x00000080 ffff88004925dba0 0000000000000046 ffff8800ffffffff 00001e6701abaaf4 000000005cd72ac8 ffff88007126d2f0 00000000001ec6e4 ffffffffad6d9703 ffff880072f07ab8 ffff88004925dfd8 000000000000fb88 ffff880072f07ab8 Call Trace: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff8150ed93>] io_schedule+0x73/0xc0 [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl] [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa03e3717>] ? taskq_dispatch_ent+0x57/0x110 [spl] [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl] [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs] [<ffffffffa04c1f2c>] dsl_pool_sync+0xec/0x540 [zfs] [<ffffffffa04da82e>] spa_sync+0x39e/0x970 [zfs] [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10 [<ffffffffa04e582a>] txg_sync_thread+0x27a/0x4b0 [zfs] [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130 [<ffffffffa04e55b0>] ? txg_sync_thread+0x0/0x4b0 [zfs] [<ffffffffa03e2a3f>] thread_generic_wrapper+0x5f/0x70 [spl] [<ffffffffa03e29e0>] ? thread_generic_wrapper+0x0/0x70 [spl] [<ffffffff81096a36>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810969a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Jian Yu added a comment - 22/Nov/13 8:11 AM

Patch http://review.whamcloud.com/8284 landed on Lustre b2_4 branch. I'll trigger ZFS full group test session on Lustre b2_4 build #57 to check the test results.

Jian Yu added a comment - 22/Nov/13 8:11 AM Patch http://review.whamcloud.com/8284 landed on Lustre b2_4 branch. I'll trigger ZFS full group test session on Lustre b2_4 build #57 to check the test results.

Jian Yu added a comment - 15/Nov/13 6:06 AM - edited

http://review.whamcloud.com/7778

The above patch and http://review.whamcloud.com/8234 have landed on master branch. They are also needed on Lustre b2_4 and b2_5 branches.

The two patches are combined and back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8284

Jian Yu added a comment - 15/Nov/13 6:06 AM - edited http://review.whamcloud.com/7778 The above patch and http://review.whamcloud.com/8234 have landed on master branch. They are also needed on Lustre b2_4 and b2_5 branches. The two patches are combined and back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8284

Jian Yu added a comment - 04/Nov/13 10:34 AM

Here is the test result on Lustre b2_4 build #47 with FSTYPE=zfs, OSTCOUNT=2 and SLOW=yes:
https://maloo.whamcloud.com/test_sessions/05b82736-444d-11e3-8472-52540035b04c

Timeout failures with OSTCOUNT=7 on the following sub-tests disappeared with OSTCOUNT=2:

sanity-benchmark     test bonnie        LU-1960
replay-ost-single    test 5             LU-2887
sanity-quota         test 7a and 12a    LU-2887
large-scale          test 3a            LU-2887
obdfilter-survey     test 1a            LU-2124

However, timeout failure on sanityn test 33a (~~LU-2829~~) still occurred.

And because of ~~LU-3906~~ (out of space issue), parallel-scale, parallel-scale-nfsv3 and parallel-scale-nfsv4 tests were not really run, then we do not know whether the timeout failures on them disappear or not with OSTCOUNT=2.

Jian Yu added a comment - 04/Nov/13 10:34 AM Here is the test result on Lustre b2_4 build #47 with FSTYPE=zfs, OSTCOUNT=2 and SLOW=yes: https://maloo.whamcloud.com/test_sessions/05b82736-444d-11e3-8472-52540035b04c Timeout failures with OSTCOUNT=7 on the following sub-tests disappeared with OSTCOUNT=2: sanity-benchmark test bonnie LU-1960 replay-ost-single test 5 LU-2887 sanity-quota test 7a and 12a LU-2887 large-scale test 3a LU-2887 obdfilter-survey test 1a LU-2124 However, timeout failure on sanityn test 33a ( LU-2829 ) still occurred. And because of LU-3906 (out of space issue), parallel-scale, parallel-scale-nfsv3 and parallel-scale-nfsv4 tests were not really run, then we do not know whether the timeout failures on them disappear or not with OSTCOUNT=2.

Jian Yu added a comment - 01/Nov/13 5:52 AM

Since TEI-790 was fixed, I've triggered a full group test session on ZFS against Lustre b2_4 build #47. I'll vet the test results to see whether the timed-out tests can pass or not with OSTCOUNT=2 and SLOW=yes.

Jian Yu added a comment - 01/Nov/13 5:52 AM Since TEI-790 was fixed, I've triggered a full group test session on ZFS against Lustre b2_4 build #47. I'll vet the test results to see whether the timed-out tests can pass or not with OSTCOUNT=2 and SLOW=yes.

Jian Yu added a comment - 17/Oct/13 1:55 PM - edited

I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems.

I just created TEI-790 to ask TEI team for help on this change.

Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool.

This needs change mdsdevname(), ostdevname() and the failover testing support codes for ZFS (http://review.whamcloud.com/6429) in test-framework.sh.

Jian Yu added a comment - 17/Oct/13 1:55 PM - edited I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. I just created TEI-790 to ask TEI team for help on this change. Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. This needs change mdsdevname(), ostdevname() and the failover testing support codes for ZFS ( http://review.whamcloud.com/6429 ) in test-framework.sh.

Andreas Dilger added a comment - 16/Oct/13 5:13 PM

I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. This is more in line with real systems, since we will typically only have a single OST per OSS with ZFS instead of 4 or more OSTs per OSS with ldiskfs. I think a lot of tests depend on having at least two OSTs, so OSTCOUNT=1 will probably cause some tests to be skipped.

Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. That would avoid the extra commits caused by having separate pools. The drawback is that all of the datasets would store their files into the same space, so some of the Lustre tests would be broken if we don't add ZFS reservations for the minimum size of the datasets (e.g. fill one OST and allocate objects to another OST would break).

Andreas Dilger added a comment - 16/Oct/13 5:13 PM I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. This is more in line with real systems, since we will typically only have a single OST per OSS with ZFS instead of 4 or more OSTs per OSS with ldiskfs. I think a lot of tests depend on having at least two OSTs, so OSTCOUNT=1 will probably cause some tests to be skipped. Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. That would avoid the extra commits caused by having separate pools. The drawback is that all of the datasets would store their files into the same space, so some of the Lustre tests would be broken if we don't add ZFS reservations for the minimum size of the datasets (e.g. fill one OST and allocate objects to another OST would break).

Nathaniel Clark added a comment - 03/Oct/13 9:23 PM

Reduce performance expectation for ZFS in sanity-quota/0, lowest observed over last 4 weeks is ~150.

http://review.whamcloud.com/7848

Nathaniel Clark added a comment - 03/Oct/13 9:23 PM Reduce performance expectation for ZFS in sanity-quota/0, lowest observed over last 4 weeks is ~150. http://review.whamcloud.com/7848

Nathaniel Clark added a comment - 30/Sep/13 6:48 PM

From 7778 https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c
the metabench result seems very strange.

metabench normally runs fairly quickly (~200-500) judging by the results in maloo for ZFS runs.

It looks like the whole system (client-30) just went out to lunch for 4 hours.

Nathaniel Clark added a comment - 30/Sep/13 6:48 PM From 7778 https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c the metabench result seems very strange. metabench normally runs fairly quickly (~200-500) judging by the results in maloo for ZFS runs. It looks like the whole system (client-30) just went out to lunch for 4 hours.

Jian Yu added a comment - 30/Sep/13 1:54 PM

Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

In autotest test run with SLOW=no and OSTCOUNT=7, parallel-scale timed out: https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c

compilebench    6005s
metabench       14400s (TIMEOUT)

We still have to reduce the number of cbench_IDIRS, cbench_RUNS, mbench_NFILES, etc. for ZFS.

Jian Yu added a comment - 30/Sep/13 1:54 PM Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison. In autotest test run with SLOW=no and OSTCOUNT=7, parallel-scale timed out: https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c compilebench 6005s metabench 14400s (TIMEOUT) We still have to reduce the number of cbench_IDIRS, cbench_RUNS, mbench_NFILES, etc. for ZFS.

Jian Yu added a comment - 29/Sep/13 6:50 AM

In manual test run with SLOW=no and OSTCOUNT=7, parallel-scale test simul passed in 112s: https://maloo.whamcloud.com/test_sets/4b220e1e-28c7-11e3-8951-52540035b04c
However, the following sub-tests still took very long time:

compilebench    7760s
iorssf          5061s
iorfpp          5748s

Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

Jian Yu added a comment - 29/Sep/13 6:50 AM In manual test run with SLOW=no and OSTCOUNT=7, parallel-scale test simul passed in 112s : https://maloo.whamcloud.com/test_sets/4b220e1e-28c7-11e3-8951-52540035b04c However, the following sub-tests still took very long time: compilebench 7760s iorssf 5061s iorfpp 5748s Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

People

Assignee:: Alex Zhuravlev

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 27/Feb/13 9:34 PM

Updated:: 29/May/17 6:03 AM

Resolved:: 03/Dec/16 1:25 AM