Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2453

parallel-scale test_write_disjoint: invalid file size 723793 instead of 827192 = 103399 * 8

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.4
    • Lustre 2.1.4
    • None
    • Lustre Branch: b2_1
      Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/148
      Distro/Arch: RHEL5.8/x86_64 (kernel version: 2.6.18-308.20.1.el5)
      Network: TCP (1GigE)
    • 3
    • 2,304
    • 5793

    Description

      The parallel-scale test write_disjoint failed as follows:

      == parallel-scale test write_disjoint: write_disjoint ================================================ 14:32:34 (1355005954)
      OPTIONS:
      WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint
      clients=fat-intel-3vm5,fat-intel-3vm6.lab.whamcloud.com 
      wdisjoint_THREADS=4
      wdisjoint_REP=10000
      MACHINEFILE=/tmp/parallel-scale.machines
      fat-intel-3vm5
      fat-intel-3vm6.lab.whamcloud.com
      + /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000
      + chmod 0777 /mnt/lustre
      drwxrwxrwx 5 root root 4096 Dec  8 14:32 /mnt/lustre
      + su mpiuser sh -c "/usr/lib64/openmpi/1.4-gcc/bin/mpirun -mca boot ssh -np 8 -machinefile /tmp/parallel-scale.machines /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 "
      --------------------------------------------------------------------------
      [[22376,1],3]: A high-performance Open MPI point-to-point messaging module
      was unable to find any relevant network interfaces:
      
      Module: OpenFabrics (openib)
        Host: fat-intel-3vm6.lab.whamcloud.com
      
      Another transport will be used instead, although this may result in
      lower performance.
      --------------------------------------------------------------------------
      loop 0: chunk_size 103399
      rank 3, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8
      --------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD 
      with errorcode -1.
      
      NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      You may or may not see output from other processes, depending on
      exactly when Open MPI kills them.
      --------------------------------------------------------------------------
      rank 4, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8
      rank 7, loop 0: invalid file size 723793 instead of 827192 = 103399 * 8
      --------------------------------------------------------------------------
      

      Maloo report: https://maloo.whamcloud.com/test_sets/bfc081dc-41bf-11e2-a653-52540035b04c

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: