Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14733

brw_bulk_ready() BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.8, Lustre 2.15.0
    • None
    • lustre-2.12.6_9.llnl client
      kernel-4.18.0-305.0.0.1toss.t4.x86_64
      RHEL84
    • 3
    • 9223372036854775807

    Description

      lnet_selftest fails between two nodes over Omnipath

      dk.opal63.llnl.gov.7:00000001:00020000:43.0:1622598261.714620:0:129525:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103 

      Bulk transfers work over Infiniband (although in that test 1 of the nodes was RHEL 7.9 and an earlier Lustre patch stack).  Bulk transfers also work over tcp using ksocklnd.

      lctl pings work fine between the same two nodes.

      mpibench and other MPI applications also work fine over Omnipath between two nodes.

      See https://github.com/LLNL/lustre/releases/tag/2.12.6_9.llnl for the patch stack

      Attachments

        1. dk.opal188.llnl.gov.7.txt
          1.03 MB
        2. dk.opal63.llnl.gov.7.txt
          757 kB
        3. dmesg.opal188.txt
          147 kB
        4. dmesg.opal63.txt
          139 kB
        5. build.txt
          268 kB
        6. diff.txt
          1 kB
        7. trace1.txt
          36 kB
        8. kprobes-off.sh
          2 kB
        9. kprobes.sh
          5 kB
        10. trace2.txt
          51 kB
        11. move_null.patch
          0.8 kB
        12. post_state.patch
          3 kB
        13. linux-kernel-test.patch
          2 kB
        14. 01-move_null.patch
          1 kB
        15. 02-post_state.patch
          4 kB

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: