Details

    • 9223372036854775807

    Description

      There could be some simple changes made to reduce individual subtest time and unmount/mount/format times that would help speed up every test session .

      Individual sanity subtests that are not doing more than "touch file; check if file exists" currently take 6-7 seconds because they are doing a lot of different things in the background in run_one() with multiple "do_nodes" commands:

      • reset fail_loc
      • check if the network is working on every node (kind of pointless given that other commands are being run on the nodes before and after this check)
      • check grant correctness
      • check dmesg for VFS inodes busy
      • check for LBUG
      • check for multiop still running

      When sanity was first written, these subtests took a fraction of a second each (i.e. they would scroll quickly up the screen). While I think the above checks are useful, the overhead could be reduced.

      I think the large part of this slowness is that each of these checks runs as a separate ssh/mcmd command, to each remote VM in series, and each ssh invocation is relatively slow.

      Speeding up the ssh invocation itself (via do_facet()/do_node()) would of course be desirable, but is not something I can control directly.

      Running the per-node checks in parallel would be a win (e.g. use real "pdsh" or "clush"), as would combining all of the checks into a single command that is run with a single ssh invocation to each node. The latter is something that can be done directly in test-framework, and is the main target of this ticket.

      Attachments

        Issue Links

          Activity

            [LU-14773] reduce run_one() overhead

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44034/
            Subject: LU-14773 tests: quiet down some verbose messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 86f16910645d9d9cad17c0f53ca1a375121e3f4c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44034/ Subject: LU-14773 tests: quiet down some verbose messages Project: fs/lustre-release Branch: master Current Patch Set: Commit: 86f16910645d9d9cad17c0f53ca1a375121e3f4c

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44033/
            Subject: LU-14773 tests: skip check_network() on working node
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 67752f6db2c1a7062a73bd6674ee53ad670b392e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44033/ Subject: LU-14773 tests: skip check_network() on working node Project: fs/lustre-release Branch: master Current Patch Set: Commit: 67752f6db2c1a7062a73bd6674ee53ad670b392e

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44034
            Subject: LU-14773 tests: quiet down some verbose messages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f23adfd71ee46dfbbf18b8b544ad311c96468fd3

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44034 Subject: LU-14773 tests: quiet down some verbose messages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f23adfd71ee46dfbbf18b8b544ad311c96468fd3

            Note that the 44033 patch is NOT the only thing that should be fixed, but is a simple patch that may produce immediate benefits (at a minimum it will avoid a lot of useless visual clutter in the subtest logs from the check_network() output).

            adilger Andreas Dilger added a comment - Note that the 44033 patch is NOT the only thing that should be fixed, but is a simple patch that may produce immediate benefits (at a minimum it will avoid a lot of useless visual clutter in the subtest logs from the check_network() output).

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44033
            Subject: LU-14773 tests: skip check_network() on working node
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d085468b6bc06c4bbdfcfbc26afa23d4b752aa64

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44033 Subject: LU-14773 tests: skip check_network() on working node Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d085468b6bc06c4bbdfcfbc26afa23d4b752aa64

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: