Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12816

LBUG: (niobuf.c:350:ptlrpc_register_bulk()) ASSERTION( !(desc->bd_registered && req->rq_send_state != LUSTRE_IMP_REPLAY) || mbits != desc->bd_last_mbits )

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0, Lustre 2.12.5
    • Lustre 2.12.1
    • None
    • 3
    • 9223372036854775807

    Description

      c0-0c0s14n3 LustreError: 7380:0:(niobuf.c:350:ptlrpc_register_bulk()) LBUG
      c0-0c0s14n3 Pid: 7380, comm: ptlrpcd_01_49
      c0-0c0s14n3 Call Trace:
      c0-0c0s14n3 [<ffffffff81008efc>] try_stack_unwind+0x17c/0x190
      c0-0c0s14n3 [<ffffffff81007e84>] dump_trace+0x64/0x380
      c0-0c0s14n3 [<ffffffffa025476e>] libcfs_call_trace+0x4e/0x60 [libcfs]
      c0-0c0s14n3 [<ffffffffa0254e75>] lbug_with_loc+0x45/0xb0 [libcfs]
      c0-0c0s14n3 [<ffffffffa0a0ed32>] ptlrpc_register_bulk+0x822/0x950 [ptlrpc]
      c0-0c0s14n3 [<ffffffffa0a0f765>] ptl_send_rpc+0x215/0xd40 [ptlrpc]
      c0-0c0s14n3 [<ffffffffa0a0561d>] ptlrpc_send_new_req+0x42d/0x9d0 [ptlrpc]
      c0-0c0s14n3 [<ffffffffa0a077b8>] ptlrpc_check_set+0x8a8/0x2c70 [ptlrpc]
      c0-0c0s14n3 [<ffffffffa0a33f2a>] ptlrpcd_check+0x3aa/0x5b0 [ptlrpc]
      c0-0c0s14n3 [<ffffffffa0a342fc>] ptlrpcd+0x1cc/0x4c0 [ptlrpc]
      c0-0c0s14n3 [<ffffffff810775b6>] kthread+0xd6/0xf0
      c0-0c0s14n3 [<ffffffff8152690f>] ret_from_fork+0x3f/0x70
      

      This is the same fundamental problem as LU-10643. If LNetMEAttach fails with an ENOMEM error, ptl_send_rpc() fails mid-processing and must cleanup the work it has done before the client tries to send the rpc again. The ptl_send_rpc path makes two calls to LNetMEAttach in the case of bulk reads and writes. LU-10643 addresses an ENOMEM after the first call. This bug is the result of an ENOMEM after the second call.

      The assertion fails because desc->bd_registered is true.

      LustreError: 7380:0:(niobuf.c:350:ptlrpc_register_bulk()) ASSERTION
      ( !(desc->bd_registered && req->rq_send_state != LUSTRE_IMP_REPLAY) || mbits != desc->bd_last_mbits ) failed: 
      registered: 1 rq_mbits: 1636629211272768 bd_last_mbits: 1636629211272768
      
      crash_x86_64> ptlrpc_request ffff88298086dc40 | grep send_state
            cr_send_state = LUSTRE_IMP_FULL,
      

      Error scenario: An attempt is made to send a bulk rpc under low memory conditions. ptl_send_rpc() successfully calls ptlrpc_register_bulk(), which attaches the request buffer and sets bd_registered. ptl_send_rpc() then tries to attach the reply buffer. But this fails with an ENOMEM error. The cleanup path does not reset bd_registered, so when the next attempt is made to send the rpc the assert is triggered in ptlrpc_register_bulk().

      ptl_send_rpc:
      ....
              ptlrpc_register_bulk:
                     sets bd_registered
                     LNetMEAttach(request buffer)   <--- CAST-16472 fixes ENOMEM error  handling
      
              if reply expected:
                     LNetMEAttach(reply buffer)
                     if ENOMEM
                            got cleanup_bulk
      ....
      cleanup_bulk:
               ptlrpc_unregister_bulk()      <--- doesn't reset bd_registered
      

      Attachments

        Issue Links

          Activity

            [LU-12816] LBUG: (niobuf.c:350:ptlrpc_register_bulk()) ASSERTION( !(desc->bd_registered && req->rq_send_state != LUSTRE_IMP_REPLAY) || mbits != desc->bd_last_mbits )

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38266/
            Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 1679e79cf103dabe25ffa88af720842011fbc628

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38266/ Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 1679e79cf103dabe25ffa88af720842011fbc628

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38266
            Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 691c8c42a4f9c0b0fb283e4095cd880f4cc4ecd6

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38266 Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 691c8c42a4f9c0b0fb283e4095cd880f4cc4ecd6
            spitzcor Cory Spitz added a comment -

            Thanks for landing this fix. Sadly, Ann has retired and she won't be able to close this bug herself. Best wishes, Ann!

            spitzcor Cory Spitz added a comment - Thanks for landing this fix. Sadly, Ann has retired and she won't be able to close this bug herself. Best wishes, Ann!

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36309/
            Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e6225c07ce4c0037a127a41b2bc539364dfd1f4d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36309/ Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM Project: fs/lustre-release Branch: master Current Patch Set: Commit: e6225c07ce4c0037a127a41b2bc539364dfd1f4d

            Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/36309
            Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e75c8ae1f2ba20d96ff2b36b0c4c4e451feab8ea

            gerrit Gerrit Updater added a comment - Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/36309 Subject: LU-12816 ptlrpc: ptlrpc_register_bulk LBUG on ENOMEM Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e75c8ae1f2ba20d96ff2b36b0c4c4e451feab8ea

            People

              amk Ann Koehler (Inactive)
              amk Ann Koehler (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: