Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3027

Failure on test suite parallel-scale test_write_disjoint: invalid file size 140329 instead of 160376 = 20047 * 8

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.2
    • Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0
    • None
    • 3
    • 7390

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/2ed1fef2-94bd-11e2-93c6-52540035b04c.

      The sub-test test_write_disjoint failed with the following error:

      write_disjoint failed! 1

      test log shows:

      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      librdmacm: Fatal: no RDMA devices found
      loop 0: chunk_size 103399
      [client-27vm6.lab.whamcloud.com:00935] 7 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
      [client-27vm6.lab.whamcloud.com:00935] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
      loop 79: chunk_size 71702, file size was 573616
      rank 2, loop 80: invalid file size 140329 instead of 160376 = 20047 * 8
      loop 79: chunk_size 71702, file size was 573616
      rank 4, loop 80: invalid file size 140329 instead of 160376 = 20047 * 8
      loop 79: chunk_size 71702, file size was 573616
      rank 6, loop 80: invalid file size 140329 instead of 160376 = 20047 * 8
      loop 79: chunk_size 71702, file size was 573616
      rank 0, loop 80: invalid file size 140329 instead of 160376 = 20047 * 8
      --------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD 
      with errorcode -1.
      

      Looks like LU-2453 is a similar issue seen in b2_1 branch

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: