Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9367

parallel-scale test_cascading_rw: cascading_rw failed! 1

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.10.0
    • Fix Version/s: Lustre 2.10.0
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2.

      The sub-test test_cascading_rw failed with the following error:

      cascading_rw failed! 1
      

      server/client: lustre-master #3558 ldiskfs el7

      test log

      + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale.machines -np 4 /usr/lib64/lustre/tests/cascading_rw -g -d /mnt/lustre/d0.cascading_rw -n 300 "
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      /usr/lib64/lustre/tests/cascading_rw is running with 4 process(es) in DEBUG mode
      23:47:45: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 0)
      [trevis-55vm1:21694] *** Process received signal ***
      [trevis-55vm1:21694] Signal: Floating point exception (8)
      [trevis-55vm1:21694] Signal code: Integer divide-by-zero (1)
      [trevis-55vm1:21694] Failing at address: 0x4024c8
      [trevis-55vm1:21694] [ 0] /lib64/libpthread.so.0(+0xf370) [0x7fdf9fad6370]
      [trevis-55vm1:21694] [ 1] /usr/lib64/lustre/tests/cascading_rw() [0x4024c8]
      [trevis-55vm1:21694] [ 2] /usr/lib64/lustre/tests/cascading_rw() [0x402be0]
      [trevis-55vm1:21694] [ 3] /usr/lib64/lustre/tests/cascading_rw() [0x40158e]
      [trevis-55vm1:21694] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fdf9f727b35]
      [trevis-55vm1:21694] [ 5] /usr/lib64/lustre/tests/cascading_rw() [0x40169d]
      [trevis-55vm1:21694] *** End of error message ***
      [trevis-55vm1.trevis.hpdd.intel.com][[36239,1],2][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      [trevis-55vm2.trevis.hpdd.intel.com][[36239,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      --------------------------------------------------------------------------
      mpirun noticed that process rank 0 with PID 21694 on node trevis-55vm1.trevis.hpdd.intel.com exited on signal 8 (Floating point exception).
      --------------------------------------------------------------------------
       parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4905:error()
        = /usr/lib64/lustre/tests/functions.sh:734:run_cascading_rw()
        = /usr/lib64/lustre/tests/parallel-scale.sh:130:test_cascading_rw()
        = /usr/lib64/lustre/tests/test-framework.sh:5181:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5220:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5067:run_test()
        = /usr/lib64/lustre/tests/parallel-scale.sh:132:main()
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bobijam Zhenyu Xu
                Reporter:
                maloo Maloo
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: