Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10710

parallel-scale test write_disjoint hung

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale test write_disjoint hung as follows on master branch:

      /usr/lib64/lustre/tests/write_disjoint: option requires an argument -- 'm'
      random seed: 1519386210
      loop 0: chunk_size 13798672
      loop 1000: chunk_size 12937343
      

      Maloo reports:
      https://testing.hpdd.intel.com/test_sets/83f68bf6-18dd-11e8-bd00-52540065bddc
      https://testing.hpdd.intel.com/test_sets/91293d5e-18d4-11e8-bd00-52540065bddc

      There are no obvious error messages in console/syslog logs. And the stack trace logs were not gathered (ATM-828).

      Attachments

        Issue Links

          Activity

            [LU-10710] parallel-scale test write_disjoint hung

            Patch landed to master (future 2.12)

            jamesanunez James Nunez (Inactive) added a comment - Patch landed to master (future 2.12)

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31645/
            Subject: LU-10710 tests: fix run_write_disjoint line continuation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a033bd837329d1eb98d1dd71f4491f1af56a27f0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31645/ Subject: LU-10710 tests: fix run_write_disjoint line continuation Project: fs/lustre-release Branch: master Current Patch Set: Commit: a033bd837329d1eb98d1dd71f4491f1af56a27f0

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/31645
            Subject: LU-10710 tests: fix run_write_disjoint line continuation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6ff2c1d3b134be39c1c63511bc3a7b06bc41214e

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/31645 Subject: LU-10710 tests: fix run_write_disjoint line continuation Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6ff2c1d3b134be39c1c63511bc3a7b06bc41214e
            jamesanunez James Nunez (Inactive) added a comment - - edited

            This issue was added by the patch for LU-9409 at https://review.whamcloud.com/27903. The patch adds the –m flag and parameter to write_disjoint()

            826  
            827         local cmd="$WRITE_DISJOINT -f $testdir/file -n $wdisjoint_REP -m "\ 
            828                           "$chunk_size_limit" 
            829  
            830         echo "+ $cmd"
            

             

            From the suite_log, we can see that the value of the –m flag never makes it to the write_disjoint command and is interpreted as a standalone line

            == parallel-scale test write_disjoint: write_disjoint ================================================ 11:43:29 (1519386209)
            
            OPTIONS:
            
            WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint
            
            clients=trevis-3vm1.trevis.hpdd.intel.com,trevis-3vm2
            
            wdisjoint_THREADS=4
            
            wdisjoint_REP=10000
            
            MACHINEFILE=/tmp/parallel-scale.machines
            
            trevis-3vm1.trevis.hpdd.intel.com
            
            trevis-3vm2
            
            /usr/lib64/lustre/tests/functions.sh: line 828: local: `123456': not a valid identifier
            
            + /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 –m
            
            

             

            The problem is the line continuation at the end of line 827; we just need to remove the end quotation on line 827 and first quotation on line 828. The reason why the code doesn’t work as it is, is due to the tab/spaces on line 828.

             

            I’ll upload a patch for this.

            jamesanunez James Nunez (Inactive) added a comment - - edited This issue was added by the patch for LU-9409 at https://review.whamcloud.com/27903 . The patch adds the –m flag and parameter to write_disjoint() 826 827         local cmd= "$WRITE_DISJOINT -f $testdir/file -n $wdisjoint_REP -m " \ 828                            "$chunk_size_limit" 829 830         echo  "+ $cmd"   From the suite_log, we can see that the value of the –m flag never makes it to the write_disjoint command and is interpreted as a standalone line == parallel-scale test write_disjoint: write_disjoint ================================================ 11:43:29 (1519386209) OPTIONS: WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint clients=trevis-3vm1.trevis.hpdd.intel.com,trevis-3vm2 wdisjoint_THREADS=4 wdisjoint_REP=10000 MACHINEFILE=/tmp/parallel-scale.machines trevis-3vm1.trevis.hpdd.intel.com trevis-3vm2 /usr/lib64/lustre/tests/functions.sh: line 828: local: `123456': not a valid identifier + /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 –m   The problem is the line continuation at the end of line 827; we just need to remove the end quotation on line 827 and first quotation on line 828. The reason why the code doesn’t work as it is, is due to the tab/spaces on line 828.   I’ll upload a patch for this.

            People

              jamesanunez James Nunez (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: