Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2887

sanity-quota test_12a: slow due to ZFS VMs sharing single disk

Details

    • 3
    • 6968

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c3a0b364-812d-11e2-b609-52540035b04c.

      The sub-test test_12a failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-quota 12a

      Looking through test 12a, things seem to have hung up on the runas dd (with oflag=sync) at the end of the test.

      OST has threads that are blocked on disk I/O (oss dmesg):

      txg_sync      D 0000000000000000     0 24236      2 0x00000080
       ffff88005027dbc0 0000000000000046 ffff88004b906ec0 0000000000000086
       ffff88005027db70 ffff88007c7d4408 0000000000000001 ffff88007c7d4420
       ffff880052101058 ffff88005027dfd8 000000000000fb88 ffff880052101058
      Call Trace:
       [<ffffffff81090b9e>] ? prepare_to_wait_exclusive+0x4e/0x80
       [<ffffffffa016b5ac>] cv_wait_common+0x9c/0x1a0 [spl]
       [<ffffffffa02d5160>] ? zio_execute+0x0/0xf0 [zfs]
       [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa016b6e3>] __cv_wait+0x13/0x20 [spl]
       [<ffffffffa02d533b>] zio_wait+0xeb/0x160 [zfs]
       [<ffffffffa026b807>] dsl_pool_sync+0x2a7/0x480 [zfs]
       [<ffffffffa027e147>] spa_sync+0x397/0x9a0 [zfs]
       [<ffffffffa028fd41>] txg_sync_thread+0x2c1/0x490 [zfs]
       [<ffffffff810527f9>] ? set_user_nice+0xc9/0x130
       [<ffffffffa028fa80>] ? txg_sync_thread+0x0/0x490 [zfs]
       [<ffffffffa0164668>] thread_generic_wrapper+0x68/0x80 [spl]
       [<ffffffffa0164600>] ? thread_generic_wrapper+0x0/0x80 [spl]
       [<ffffffff81090626>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff81090590>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      ll_ost_io00_0 D 0000000000000000     0 18170      2 0x00000080
       ffff8800427b9820 0000000000000046 0000000000000046 0000000000000001
       ffff8800427b98b0 0000000000000086 ffff8800427b97e0 ffff88005027dd60
       ffff8800427b7ab8 ffff8800427b9fd8 000000000000fb88 ffff8800427b7ab8
      Call Trace:
       [<ffffffff81090b9e>] ? prepare_to_wait_exclusive+0x4e/0x80
       [<ffffffffa016b5ac>] cv_wait_common+0x9c/0x1a0 [spl]
       [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa016b6e3>] __cv_wait+0x13/0x20 [spl]
       [<ffffffffa028f573>] txg_wait_synced+0xb3/0x190 [zfs]
       [<ffffffffa0c71015>] osd_trans_stop+0x365/0x420 [osd_zfs]
       [<ffffffffa0cb9062>] ofd_trans_stop+0x22/0x60 [ofd]
       [<ffffffffa0cbdf06>] ofd_commitrw_write+0x406/0x11b0 [ofd]
       [<ffffffffa0cbf13d>] ofd_commitrw+0x48d/0x920 [ofd]
       [<ffffffffa085b708>] obd_commitrw+0x128/0x3d0 [ost]
       [<ffffffffa0862599>] ost_brw_write+0xe49/0x14d0 [ost]
       [<ffffffff812739b6>] ? vsnprintf+0x2b6/0x5f0
       [<ffffffffa088c1f0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
       [<ffffffffa08680e3>] ost_handle+0x31e3/0x46f0 [ost]
       [<ffffffffa05ca154>] ? libcfs_id2str+0x74/0xb0 [libcfs]
       [<ffffffffa08dc02c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc]
       [<ffffffffa05be5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa08d3759>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
       [<ffffffff81052223>] ? __wake_up+0x53/0x70
       [<ffffffffa08dd576>] ptlrpc_main+0xb76/0x1870 [ptlrpc]
       [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffffa08dca00>] ? ptlrpc_main+0x0/0x1870 [ptlrpc]
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      Attachments

        1. txgs.1380193340.log
          55 kB
        2. txgs.1380193506.log
          55 kB
        3. txgs.1380194292.log
          55 kB

        Issue Links

          Activity

            [LU-2887] sanity-quota test_12a: slow due to ZFS VMs sharing single disk
            yujian Jian Yu added a comment -

            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/
            Distro/Arch: RHEL6.4/x86_64

            FSTYPE=zfs
            MDSCOUNT=1
            MDSSIZE=2097152
            OSTCOUNT=2
            OSTSIZE=2097152

            replay-ost-single test 5 still timed out: https://maloo.whamcloud.com/test_sets/a76a6b78-5606-11e3-8e94-52540035b04c

            txg_sync      D 0000000000000000     0 32309      2 0x00000080
             ffff88004925dba0 0000000000000046 ffff8800ffffffff 00001e6701abaaf4
             000000005cd72ac8 ffff88007126d2f0 00000000001ec6e4 ffffffffad6d9703
             ffff880072f07ab8 ffff88004925dfd8 000000000000fb88 ffff880072f07ab8
            Call Trace:
             [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0
             [<ffffffff8150ed93>] io_schedule+0x73/0xc0
             [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl]
             [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
             [<ffffffffa03e3717>] ? taskq_dispatch_ent+0x57/0x110 [spl]
             [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl]
             [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs]
             [<ffffffffa04c1f2c>] dsl_pool_sync+0xec/0x540 [zfs]
             [<ffffffffa04da82e>] spa_sync+0x39e/0x970 [zfs]
             [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10
             [<ffffffffa04e582a>] txg_sync_thread+0x27a/0x4b0 [zfs]
             [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
             [<ffffffffa04e55b0>] ? txg_sync_thread+0x0/0x4b0 [zfs]
             [<ffffffffa03e2a3f>] thread_generic_wrapper+0x5f/0x70 [spl]
             [<ffffffffa03e29e0>] ? thread_generic_wrapper+0x0/0x70 [spl]
             [<ffffffff81096a36>] kthread+0x96/0xa0
             [<ffffffff8100c0ca>] child_rip+0xa/0x20
             [<ffffffff810969a0>] ? kthread+0x0/0xa0
             [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            
            yujian Jian Yu added a comment - Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/58/ Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=2097152 replay-ost-single test 5 still timed out: https://maloo.whamcloud.com/test_sets/a76a6b78-5606-11e3-8e94-52540035b04c txg_sync D 0000000000000000 0 32309 2 0x00000080 ffff88004925dba0 0000000000000046 ffff8800ffffffff 00001e6701abaaf4 000000005cd72ac8 ffff88007126d2f0 00000000001ec6e4 ffffffffad6d9703 ffff880072f07ab8 ffff88004925dfd8 000000000000fb88 ffff880072f07ab8 Call Trace: [<ffffffff810a2431>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff8150ed93>] io_schedule+0x73/0xc0 [<ffffffffa03e6d4c>] cv_wait_common+0x8c/0x100 [spl] [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa03e3717>] ? taskq_dispatch_ent+0x57/0x110 [spl] [<ffffffffa03e6dd8>] __cv_wait_io+0x18/0x20 [spl] [<ffffffffa052939b>] zio_wait+0xfb/0x190 [zfs] [<ffffffffa04c1f2c>] dsl_pool_sync+0xec/0x540 [zfs] [<ffffffffa04da82e>] spa_sync+0x39e/0x970 [zfs] [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10 [<ffffffffa04e582a>] txg_sync_thread+0x27a/0x4b0 [zfs] [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130 [<ffffffffa04e55b0>] ? txg_sync_thread+0x0/0x4b0 [zfs] [<ffffffffa03e2a3f>] thread_generic_wrapper+0x5f/0x70 [spl] [<ffffffffa03e29e0>] ? thread_generic_wrapper+0x0/0x70 [spl] [<ffffffff81096a36>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810969a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            yujian Jian Yu added a comment -

            Patch http://review.whamcloud.com/8284 landed on Lustre b2_4 branch. I'll trigger ZFS full group test session on Lustre b2_4 build #57 to check the test results.

            yujian Jian Yu added a comment - Patch http://review.whamcloud.com/8284 landed on Lustre b2_4 branch. I'll trigger ZFS full group test session on Lustre b2_4 build #57 to check the test results.
            yujian Jian Yu added a comment - - edited

            http://review.whamcloud.com/7778

            The above patch and http://review.whamcloud.com/8234 have landed on master branch. They are also needed on Lustre b2_4 and b2_5 branches.

            The two patches are combined and back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8284

            yujian Jian Yu added a comment - - edited http://review.whamcloud.com/7778 The above patch and http://review.whamcloud.com/8234 have landed on master branch. They are also needed on Lustre b2_4 and b2_5 branches. The two patches are combined and back-ported to Lustre b2_4 branch: http://review.whamcloud.com/8284
            yujian Jian Yu added a comment -

            Here is the test result on Lustre b2_4 build #47 with FSTYPE=zfs, OSTCOUNT=2 and SLOW=yes:
            https://maloo.whamcloud.com/test_sessions/05b82736-444d-11e3-8472-52540035b04c

            Timeout failures with OSTCOUNT=7 on the following sub-tests disappeared with OSTCOUNT=2:

            sanity-benchmark     test bonnie        LU-1960
            replay-ost-single    test 5             LU-2887
            sanity-quota         test 7a and 12a    LU-2887
            large-scale          test 3a            LU-2887
            obdfilter-survey     test 1a            LU-2124
            

            However, timeout failure on sanityn test 33a (LU-2829) still occurred.

            And because of LU-3906 (out of space issue), parallel-scale, parallel-scale-nfsv3 and parallel-scale-nfsv4 tests were not really run, then we do not know whether the timeout failures on them disappear or not with OSTCOUNT=2.

            yujian Jian Yu added a comment - Here is the test result on Lustre b2_4 build #47 with FSTYPE=zfs, OSTCOUNT=2 and SLOW=yes: https://maloo.whamcloud.com/test_sessions/05b82736-444d-11e3-8472-52540035b04c Timeout failures with OSTCOUNT=7 on the following sub-tests disappeared with OSTCOUNT=2: sanity-benchmark test bonnie LU-1960 replay-ost-single test 5 LU-2887 sanity-quota test 7a and 12a LU-2887 large-scale test 3a LU-2887 obdfilter-survey test 1a LU-2124 However, timeout failure on sanityn test 33a ( LU-2829 ) still occurred. And because of LU-3906 (out of space issue), parallel-scale, parallel-scale-nfsv3 and parallel-scale-nfsv4 tests were not really run, then we do not know whether the timeout failures on them disappear or not with OSTCOUNT=2.
            yujian Jian Yu added a comment -

            Since TEI-790 was fixed, I've triggered a full group test session on ZFS against Lustre b2_4 build #47. I'll vet the test results to see whether the timed-out tests can pass or not with OSTCOUNT=2 and SLOW=yes.

            yujian Jian Yu added a comment - Since TEI-790 was fixed, I've triggered a full group test session on ZFS against Lustre b2_4 build #47. I'll vet the test results to see whether the timed-out tests can pass or not with OSTCOUNT=2 and SLOW=yes.
            yujian Jian Yu added a comment - - edited

            I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems.

            I just created TEI-790 to ask TEI team for help on this change.

            Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool.

            This needs change mdsdevname(), ostdevname() and the failover testing support codes for ZFS (http://review.whamcloud.com/6429) in test-framework.sh.

            yujian Jian Yu added a comment - - edited I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. I just created TEI-790 to ask TEI team for help on this change. Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. This needs change mdsdevname(), ostdevname() and the failover testing support codes for ZFS ( http://review.whamcloud.com/6429 ) in test-framework.sh.

            I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. This is more in line with real systems, since we will typically only have a single OST per OSS with ZFS instead of 4 or more OSTs per OSS with ldiskfs. I think a lot of tests depend on having at least two OSTs, so OSTCOUNT=1 will probably cause some tests to be skipped.

            Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. That would avoid the extra commits caused by having separate pools. The drawback is that all of the datasets would store their files into the same space, so some of the Lustre tests would be broken if we don't add ZFS reservations for the minimum size of the datasets (e.g. fill one OST and allocate objects to another OST would break).

            adilger Andreas Dilger added a comment - I would prefer the approach of setting OSTCOUNT=2 for ZFS-backed test filesystems. This is more in line with real systems, since we will typically only have a single OST per OSS with ZFS instead of 4 or more OSTs per OSS with ldiskfs. I think a lot of tests depend on having at least two OSTs, so OSTCOUNT=1 will probably cause some tests to be skipped. Another complimentary approach would be to format a single ZFS pool across a few LVs and then have the different OST/MDT targets in their own datasets in the shared pool. That would avoid the extra commits caused by having separate pools. The drawback is that all of the datasets would store their files into the same space, so some of the Lustre tests would be broken if we don't add ZFS reservations for the minimum size of the datasets (e.g. fill one OST and allocate objects to another OST would break).

            Reduce performance expectation for ZFS in sanity-quota/0, lowest observed over last 4 weeks is ~150.

            http://review.whamcloud.com/7848

            utopiabound Nathaniel Clark added a comment - Reduce performance expectation for ZFS in sanity-quota/0, lowest observed over last 4 weeks is ~150. http://review.whamcloud.com/7848

            From 7778 https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c
            the metabench result seems very strange.

            metabench normally runs fairly quickly (~200-500) judging by the results in maloo for ZFS runs.

            It looks like the whole system (client-30) just went out to lunch for 4 hours.

            utopiabound Nathaniel Clark added a comment - From 7778 https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c the metabench result seems very strange. metabench normally runs fairly quickly (~200-500) judging by the results in maloo for ZFS runs. It looks like the whole system (client-30) just went out to lunch for 4 hours.
            yujian Jian Yu added a comment -

            Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

            In autotest test run with SLOW=no and OSTCOUNT=7, parallel-scale timed out: https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c

            compilebench    6005s
            metabench       14400s (TIMEOUT)
            

            We still have to reduce the number of cbench_IDIRS, cbench_RUNS, mbench_NFILES, etc. for ZFS.

            yujian Jian Yu added a comment - Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison. In autotest test run with SLOW=no and OSTCOUNT=7, parallel-scale timed out: https://maloo.whamcloud.com/test_sets/487e3fe6-29c3-11e3-b5ea-52540035b04c compilebench 6005s metabench 14400s (TIMEOUT) We still have to reduce the number of cbench_IDIRS, cbench_RUNS, mbench_NFILES, etc. for ZFS.
            yujian Jian Yu added a comment -

            In manual test run with SLOW=no and OSTCOUNT=7, parallel-scale test simul passed in 112s: https://maloo.whamcloud.com/test_sets/4b220e1e-28c7-11e3-8951-52540035b04c
            However, the following sub-tests still took very long time:

            compilebench    7760s
            iorssf          5061s
            iorfpp          5748s
            

            Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

            yujian Jian Yu added a comment - In manual test run with SLOW=no and OSTCOUNT=7, parallel-scale test simul passed in 112s : https://maloo.whamcloud.com/test_sets/4b220e1e-28c7-11e3-8951-52540035b04c However, the following sub-tests still took very long time: compilebench 7760s iorssf 5061s iorfpp 5748s Let's wait for the autotest test result in http://review.whamcloud.com/7778 to do a comparison.

            People

              bzzz Alex Zhuravlev
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: