Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1360

Test failure on test suite parallel-scale-nfsv3, subtest test_metabench

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • None
    • Lustre 2.1.2, Lustre 2.1.3, Lustre 2.1.4, Lustre 2.1.5, Lustre 2.1.6
    • 3
    • 4036

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b019eb0a-929d-11e1-9e8b-525400d2bfa6.

      The sub-test test_metabench failed with the following error:

      metabench failed! 1

      == parallel-scale-nfsv3 test metabench: metabench ==================================================== 18:11:14 (1335748274)
      OPTIONS:
      METABENCH=/usr/bin/metabench
      clients=iu-3vm1.lab.whamcloud.com,iu-3vm2
      mbench_NFILES=30400
      mbench_THREADS=4
      iu-3vm1.lab.whamcloud.com
      iu-3vm2
      + /usr/bin/metabench -w /mnt/lustre/d0.metabench -c 30400 -C -S -k
      + chmod 0777 /mnt/lustre
      drwxrwxrwx 4 root root 4096 Apr 29 18:11 /mnt/lustre
      + su mpiuser sh -c "/usr/lib/openmpi/1.4-gcc/bin/mpirun -mca boot ssh -mca btl tcp,self -np 8 -machinefile /tmp/parallel-scale-nfsv3.machines /usr/bin/metabench -w /mnt/lustre/d0.metabench -c 30400 -C -S -k "
      Metadata Test <no-name> on 04/29/2012 at 18:11:19

      Rank 0 process on node iu-3vm1.lab.whamcloud.com
      Rank 1 process on node iu-3vm2.lab.whamcloud.com
      Rank 2 process on node iu-3vm1.lab.whamcloud.com
      Rank 3 process on node iu-3vm2.lab.whamcloud.com
      Rank 4 process on node iu-3vm1.lab.whamcloud.com
      Rank 5 process on node iu-3vm2.lab.whamcloud.com
      Rank 6 process on node iu-3vm1.lab.whamcloud.com
      Rank 7 process on node iu-3vm2.lab.whamcloud.com

      [04/29/2012 18:11:19] FATAL error on process 0
      Proc 0: Cant stat [d0.metabench]: Value too large for defined data type
      --------------------------------------------------------------------------
      mpirun has exited due to process rank 0 with PID 2161 on
      node iu-3vm1.lab.whamcloud.com exiting without calling "finalize". This may
      have caused other processes in the application to be
      terminated by signals sent by mpirun (as reported here).
      --------------------------------------------------------------------------
      [iu-3vm2.lab.whamcloud.com][[3254,1],1][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] [iu-3vm1.lab.whamcloud.com][[3254,1],2][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      [iu-3vm1.lab.whamcloud.com][[3254,1],4][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
      parallel-scale-nfsv3 test_metabench: @@@@@@ FAIL: metabench failed! 1
      Dumping lctl log to /logdir/test_logs/2012-04-28/lustre-b2_1-el5-x86_64-el5-i686_51_-7ff324267018/parallel-scale-nfsv3.test_metabench.*.1335748279.log

      Attachments

        Issue Links

          Activity

            [LU-1360] Test failure on test suite parallel-scale-nfsv3, subtest test_metabench

            Really old blocker for unsupported version

            simmonsja James A Simmons added a comment - Really old blocker for unsupported version

            This test is still failing on a regular basis in full testing on b2_4 and master:

            https://maloo.whamcloud.com/test_sets/bd013d78-543e-11e3-9029-52540035b04c
            https://maloo.whamcloud.com/test_sets/5751a248-5168-11e3-8300-52540035b04c
            https://maloo.whamcloud.com/test_sets/795032bc-5148-11e3-9ca9-52540035b04c

            Bob, can you please at least make an initial investigation of what the problem is. It does appear that the test passes 1/2 of the time, so if this can be isolated to a specific Lustre version or interop config perhaps we can fix the problem or skip testing it.

            adilger Andreas Dilger added a comment - This test is still failing on a regular basis in full testing on b2_4 and master: https://maloo.whamcloud.com/test_sets/bd013d78-543e-11e3-9029-52540035b04c https://maloo.whamcloud.com/test_sets/5751a248-5168-11e3-8300-52540035b04c https://maloo.whamcloud.com/test_sets/795032bc-5148-11e3-9ca9-52540035b04c Bob, can you please at least make an initial investigation of what the problem is. It does appear that the test passes 1/2 of the time, so if this can be isolated to a specific Lustre version or interop config perhaps we can fix the problem or skip testing it.
            yujian Jian Yu added a comment - The same issue occurred on Lustre 2.1.6 RC2: https://maloo.whamcloud.com/test_sets/217ee754-dd7b-11e2-85a3-52540035b04c https://maloo.whamcloud.com/test_sets/d54dfb12-dd7b-11e2-85a3-52540035b04c
            yujian Jian Yu added a comment -
            yujian Jian Yu added a comment - The same issue occurred on Lustre 2.1.6 RC1: https://maloo.whamcloud.com/test_sets/bae7a9ee-cd4a-11e2-a1e0-52540035b04c
            yujian Jian Yu added a comment - Lustre b2_1 build: http://build.whamcloud.com/job/lustre-b2_1/204 https://maloo.whamcloud.com/test_sets/776715d0-c63b-11e2-ad5d-52540035b04c
            yujian Jian Yu added a comment - Another instance on Lustre 2.1.5 RC1: https://maloo.whamcloud.com/test_sets/2d8c4fee-95bb-11e2-bc9e-52540035b04c
            yujian Jian Yu added a comment -
            yujian Jian Yu added a comment - The same issue occurred on Lustre 2.1.5 RC1: https://maloo.whamcloud.com/test_sets/01308e2e-93a9-11e2-89cc-52540035b04c
            yujian Jian Yu added a comment -
            yujian Jian Yu added a comment - The same issue occurred on Lustre 2.1.4 RC1: https://maloo.whamcloud.com/test_sets/9e85f1a2-4ad7-11e2-b87e-52540035b04c
            yujian Jian Yu added a comment -
            yujian Jian Yu added a comment - The same issue occurred on Lustre 2.1.3 RC2: https://maloo.whamcloud.com/test_sets/a5d75d36-eb34-11e1-ba73-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_1_2_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/
            Distro/Arch: RHEL6.2/x86_64(server), RHEL6.2/i686(client)
            Network: TCP (1GigE)
            ENABLE_QUOTA=yes

            The same failure occurred: https://maloo.whamcloud.com/test_sets/f13b8f5a-aac9-11e1-bd84-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_1_2_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/ Distro/Arch: RHEL6.2/x86_64(server), RHEL6.2/i686(client) Network: TCP (1GigE) ENABLE_QUOTA=yes The same failure occurred: https://maloo.whamcloud.com/test_sets/f13b8f5a-aac9-11e1-bd84-52540035b04c

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: