Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14773

reduce run_one() overhead

    XMLWordPrintable

Details

    • 9223372036854775807

    Description

      There could be some simple changes made to reduce individual subtest time and unmount/mount/format times that would help speed up every test session .

      Individual sanity subtests that are not doing more than "touch file; check if file exists" currently take 6-7 seconds because they are doing a lot of different things in the background in run_one() with multiple "do_nodes" commands:

      • reset fail_loc
      • check if the network is working on every node (kind of pointless given that other commands are being run on the nodes before and after this check)
      • check grant correctness
      • check dmesg for VFS inodes busy
      • check for LBUG
      • check for multiop still running

      When sanity was first written, these subtests took a fraction of a second each (i.e. they would scroll quickly up the screen). While I think the above checks are useful, the overhead could be reduced.

      I think the large part of this slowness is that each of these checks runs as a separate ssh/mcmd command, to each remote VM in series, and each ssh invocation is relatively slow.

      Speeding up the ssh invocation itself (via do_facet()/do_node()) would of course be desirable, but is not something I can control directly.

      Running the per-node checks in parallel would be a win (e.g. use real "pdsh" or "clush"), as would combining all of the checks into a single command that is run with a single ssh invocation to each node. The latter is something that can be done directly in test-framework, and is the main target of this ticket.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: