Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14733

brw_bulk_ready() BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • Lustre 2.12.8, Lustre 2.15.0
    • None
    • lustre-2.12.6_9.llnl client
      kernel-4.18.0-305.0.0.1toss.t4.x86_64
      RHEL84
    • 3
    • 9223372036854775807

      lnet_selftest fails between two nodes over Omnipath

      dk.opal63.llnl.gov.7:00000001:00020000:43.0:1622598261.714620:0:129525:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103 

      Bulk transfers work over Infiniband (although in that test 1 of the nodes was RHEL 7.9 and an earlier Lustre patch stack).  Bulk transfers also work over tcp using ksocklnd.

      lctl pings work fine between the same two nodes.

      mpibench and other MPI applications also work fine over Omnipath between two nodes.

      See https://github.com/LLNL/lustre/releases/tag/2.12.6_9.llnl for the patch stack

        1. 01-move_null.patch
          1 kB
        2. 02-post_state.patch
          4 kB
        3. build.txt
          268 kB
        4. diff.txt
          1 kB
        5. dk.opal188.llnl.gov.7.txt
          1.03 MB
        6. dk.opal63.llnl.gov.7.txt
          757 kB
        7. dmesg.opal188.txt
          147 kB
        8. dmesg.opal63.txt
          139 kB
        9. kprobes.sh
          5 kB
        10. kprobes-off.sh
          2 kB
        11. linux-kernel-test.patch
          2 kB
        12. move_null.patch
          0.8 kB
        13. post_state.patch
          3 kB
        14. trace1.txt
          36 kB
        15. trace2.txt
          51 kB

            ssmirnov Serguei Smirnov
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: