Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9367

parallel-scale test_cascading_rw: cascading_rw failed! 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2.

      The sub-test test_cascading_rw failed with the following error:

      cascading_rw failed! 1
      

      server/client: lustre-master #3558 ldiskfs el7

      test log

      + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale.machines -np 4 /usr/lib64/lustre/tests/cascading_rw -g -d /mnt/lustre/d0.cascading_rw -n 300 "
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      --------------------------------------------------------------------------
      A deprecated MCA parameter value was specified in an MCA parameter
      file.  Deprecated MCA parameters should be avoided; they may disappear
      in future releases.
      
        Deprecated parameter: plm_rsh_agent
      --------------------------------------------------------------------------
      /usr/lib64/lustre/tests/cascading_rw is running with 4 process(es) in DEBUG mode
      23:47:45: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 0)
      [trevis-55vm1:21694] *** Process received signal ***
      [trevis-55vm1:21694] Signal: Floating point exception (8)
      [trevis-55vm1:21694] Signal code: Integer divide-by-zero (1)
      [trevis-55vm1:21694] Failing at address: 0x4024c8
      [trevis-55vm1:21694] [ 0] /lib64/libpthread.so.0(+0xf370) [0x7fdf9fad6370]
      [trevis-55vm1:21694] [ 1] /usr/lib64/lustre/tests/cascading_rw() [0x4024c8]
      [trevis-55vm1:21694] [ 2] /usr/lib64/lustre/tests/cascading_rw() [0x402be0]
      [trevis-55vm1:21694] [ 3] /usr/lib64/lustre/tests/cascading_rw() [0x40158e]
      [trevis-55vm1:21694] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fdf9f727b35]
      [trevis-55vm1:21694] [ 5] /usr/lib64/lustre/tests/cascading_rw() [0x40169d]
      [trevis-55vm1:21694] *** End of error message ***
      [trevis-55vm1.trevis.hpdd.intel.com][[36239,1],2][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      [trevis-55vm2.trevis.hpdd.intel.com][[36239,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      --------------------------------------------------------------------------
      mpirun noticed that process rank 0 with PID 21694 on node trevis-55vm1.trevis.hpdd.intel.com exited on signal 8 (Floating point exception).
      --------------------------------------------------------------------------
       parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4905:error()
        = /usr/lib64/lustre/tests/functions.sh:734:run_cascading_rw()
        = /usr/lib64/lustre/tests/parallel-scale.sh:130:test_cascading_rw()
        = /usr/lib64/lustre/tests/test-framework.sh:5181:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5220:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5067:run_test()
        = /usr/lib64/lustre/tests/parallel-scale.sh:132:main()
      

      Attachments

        Issue Links

          Activity

            [LU-9367] parallel-scale test_cascading_rw: cascading_rw failed! 1
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            jcasper James Casper (Inactive) made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 20621 ]
            eberglan Eric Bergland (Inactive) made changes -
            Link New: This issue is related to LU-9349 [ LU-9349 ]
            jcasper James Casper (Inactive) made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 20413 ]
            jcasper James Casper (Inactive) made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 20372 ]
            jamesanunez James Nunez (Inactive) made changes -
            Link New: This issue is related to LU-9429 [ LU-9429 ]
            pjones Peter Jones made changes -
            Priority Original: Minor [ 4 ] New: Critical [ 2 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.10.0 [ 12204 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Zhenyu Xu [ bobijam ]
            sarah Sarah Liu made changes -
            Description Original: This issue was created by maloo for sarah_lw &lt;wei3.liu@intel.com&gt;

            This issue relates to the following test suite run: [https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2].

            The sub-test test_cascading_rw failed with the following error:
            {noformat}
            cascading_rw failed! 1
            {noformat}

            test log
            {noformat}
            + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale.machines -np 4 /usr/lib64/lustre/tests/cascading_rw -g -d /mnt/lustre/d0.cascading_rw -n 300 "
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            /usr/lib64/lustre/tests/cascading_rw is running with 4 process(es) in DEBUG mode
            23:47:45: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 0)
            [trevis-55vm1:21694] *** Process received signal ***
            [trevis-55vm1:21694] Signal: Floating point exception (8)
            [trevis-55vm1:21694] Signal code: Integer divide-by-zero (1)
            [trevis-55vm1:21694] Failing at address: 0x4024c8
            [trevis-55vm1:21694] [ 0] /lib64/libpthread.so.0(+0xf370) [0x7fdf9fad6370]
            [trevis-55vm1:21694] [ 1] /usr/lib64/lustre/tests/cascading_rw() [0x4024c8]
            [trevis-55vm1:21694] [ 2] /usr/lib64/lustre/tests/cascading_rw() [0x402be0]
            [trevis-55vm1:21694] [ 3] /usr/lib64/lustre/tests/cascading_rw() [0x40158e]
            [trevis-55vm1:21694] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fdf9f727b35]
            [trevis-55vm1:21694] [ 5] /usr/lib64/lustre/tests/cascading_rw() [0x40169d]
            [trevis-55vm1:21694] *** End of error message ***
            [trevis-55vm1.trevis.hpdd.intel.com][[36239,1],2][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
            [trevis-55vm2.trevis.hpdd.intel.com][[36239,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
            --------------------------------------------------------------------------
            mpirun noticed that process rank 0 with PID 21694 on node trevis-55vm1.trevis.hpdd.intel.com exited on signal 8 (Floating point exception).
            --------------------------------------------------------------------------
             parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1
              Trace dump:
              = /usr/lib64/lustre/tests/test-framework.sh:4905:error()
              = /usr/lib64/lustre/tests/functions.sh:734:run_cascading_rw()
              = /usr/lib64/lustre/tests/parallel-scale.sh:130:test_cascading_rw()
              = /usr/lib64/lustre/tests/test-framework.sh:5181:run_one()
              = /usr/lib64/lustre/tests/test-framework.sh:5220:run_one_logged()
              = /usr/lib64/lustre/tests/test-framework.sh:5067:run_test()
              = /usr/lib64/lustre/tests/parallel-scale.sh:132:main()
            {noformat}
            New: This issue was created by maloo for sarah_lw &lt;wei3.liu@intel.com&gt;

            This issue relates to the following test suite run: [https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2].

            The sub-test test_cascading_rw failed with the following error:
            {noformat}
            cascading_rw failed! 1
            {noformat}

            server/client: lustre-master #3558 ldiskfs el7

            test log
            {noformat}
            + su mpiuser sh -c "/usr/lib64/compat-openmpi16/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale.machines -np 4 /usr/lib64/lustre/tests/cascading_rw -g -d /mnt/lustre/d0.cascading_rw -n 300 "
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            --------------------------------------------------------------------------
            A deprecated MCA parameter value was specified in an MCA parameter
            file. Deprecated MCA parameters should be avoided; they may disappear
            in future releases.

              Deprecated parameter: plm_rsh_agent
            --------------------------------------------------------------------------
            /usr/lib64/lustre/tests/cascading_rw is running with 4 process(es) in DEBUG mode
            23:47:45: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 0)
            [trevis-55vm1:21694] *** Process received signal ***
            [trevis-55vm1:21694] Signal: Floating point exception (8)
            [trevis-55vm1:21694] Signal code: Integer divide-by-zero (1)
            [trevis-55vm1:21694] Failing at address: 0x4024c8
            [trevis-55vm1:21694] [ 0] /lib64/libpthread.so.0(+0xf370) [0x7fdf9fad6370]
            [trevis-55vm1:21694] [ 1] /usr/lib64/lustre/tests/cascading_rw() [0x4024c8]
            [trevis-55vm1:21694] [ 2] /usr/lib64/lustre/tests/cascading_rw() [0x402be0]
            [trevis-55vm1:21694] [ 3] /usr/lib64/lustre/tests/cascading_rw() [0x40158e]
            [trevis-55vm1:21694] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fdf9f727b35]
            [trevis-55vm1:21694] [ 5] /usr/lib64/lustre/tests/cascading_rw() [0x40169d]
            [trevis-55vm1:21694] *** End of error message ***
            [trevis-55vm1.trevis.hpdd.intel.com][[36239,1],2][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
            [trevis-55vm2.trevis.hpdd.intel.com][[36239,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
            --------------------------------------------------------------------------
            mpirun noticed that process rank 0 with PID 21694 on node trevis-55vm1.trevis.hpdd.intel.com exited on signal 8 (Floating point exception).
            --------------------------------------------------------------------------
             parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1
              Trace dump:
              = /usr/lib64/lustre/tests/test-framework.sh:4905:error()
              = /usr/lib64/lustre/tests/functions.sh:734:run_cascading_rw()
              = /usr/lib64/lustre/tests/parallel-scale.sh:130:test_cascading_rw()
              = /usr/lib64/lustre/tests/test-framework.sh:5181:run_one()
              = /usr/lib64/lustre/tests/test-framework.sh:5220:run_one_logged()
              = /usr/lib64/lustre/tests/test-framework.sh:5067:run_test()
              = /usr/lib64/lustre/tests/parallel-scale.sh:132:main()
            {noformat}

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: