Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14733

brw_bulk_ready() BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.8, Lustre 2.15.0
    • None
    • lustre-2.12.6_9.llnl client
      kernel-4.18.0-305.0.0.1toss.t4.x86_64
      RHEL84
    • 3
    • 9223372036854775807

    Description

      lnet_selftest fails between two nodes over Omnipath

      dk.opal63.llnl.gov.7:00000001:00020000:43.0:1622598261.714620:0:129525:0:(brw_test.c:415:brw_bulk_ready()) BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103 

      Bulk transfers work over Infiniband (although in that test 1 of the nodes was RHEL 7.9 and an earlier Lustre patch stack).  Bulk transfers also work over tcp using ksocklnd.

      lctl pings work fine between the same two nodes.

      mpibench and other MPI applications also work fine over Omnipath between two nodes.

      See https://github.com/LLNL/lustre/releases/tag/2.12.6_9.llnl for the patch stack

      Attachments

        1. 01-move_null.patch
          1 kB
        2. 02-post_state.patch
          4 kB
        3. build.txt
          268 kB
        4. diff.txt
          1 kB
        5. dk.opal188.llnl.gov.7.txt
          1.03 MB
        6. dk.opal63.llnl.gov.7.txt
          757 kB
        7. dmesg.opal188.txt
          147 kB
        8. dmesg.opal63.txt
          139 kB
        9. kprobes.sh
          5 kB
        10. kprobes-off.sh
          2 kB
        11. linux-kernel-test.patch
          2 kB
        12. move_null.patch
          0.8 kB
        13. post_state.patch
          3 kB
        14. trace1.txt
          36 kB
        15. trace2.txt
          51 kB

        Issue Links

          Activity

            [LU-14733] brw_bulk_ready() BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/44296/
            Subject: LU-14733 o2iblnd: Avoid double posting invalidate
            Project: fs/lustre-release
            Branch: b2_14
            Current Patch Set:
            Commit: 29da7cba3e7b3461d895010c7f7284b9649aba52

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/44296/ Subject: LU-14733 o2iblnd: Avoid double posting invalidate Project: fs/lustre-release Branch: b2_14 Current Patch Set: Commit: 29da7cba3e7b3461d895010c7f7284b9649aba52

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/44295/
            Subject: LU-14733 o2iblnd: Move racy NULL assignment
            Project: fs/lustre-release
            Branch: b2_14
            Current Patch Set:
            Commit: 380be07fcca1f76564d1f29e58f2d8d5f8f530c8

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/44295/ Subject: LU-14733 o2iblnd: Move racy NULL assignment Project: fs/lustre-release Branch: b2_14 Current Patch Set: Commit: 380be07fcca1f76564d1f29e58f2d8d5f8f530c8

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44217/
            Subject: LU-14733 o2iblnd: Avoid double posting invalidate
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 96d7dcf4e773e6026a590e4596ef30ac8a4a5061

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44217/ Subject: LU-14733 o2iblnd: Avoid double posting invalidate Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 96d7dcf4e773e6026a590e4596ef30ac8a4a5061

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44216/
            Subject: LU-14733 o2iblnd: Move racy NULL assignment
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 173d60a3274d19bc1d9811b6e1b09aac2b25f221

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44216/ Subject: LU-14733 o2iblnd: Move racy NULL assignment Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 173d60a3274d19bc1d9811b6e1b09aac2b25f221
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15
            ofaaland Olaf Faaland added a comment -

            Mike, Serguei, this works on our test system

            ofaaland Olaf Faaland added a comment - Mike, Serguei, this works on our test system

            People

              ssmirnov Serguei Smirnov
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: