Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9828

LBUG ASSERTION( desc->bd_nob_transferred == 0 ) failed:

Details

    • 3
    • 9223372036854775807

    Description

      One of clients crashed due to the following LBUG.

      LustreError: 11818:0:(events.c:201:client_bulk_callback()) event type 2, status -103, desc ffff880827971600
      LustreError: 11840:0:(niobuf.c:329:ptlrpc_register_bulk()) ASSERTION( desc->bd_nob_transferred == 0 ) failed:
      LustreError: 11818:0:(events.c:201:client_bulk_callback()) event type 2, status -103, desc ffff880d40623400
      Lustre: yshare1-OST0023-osc-ffff882049a1c800: Connection to yshare1-OST0023 (at 172.28.8.204@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: Skipped 21 previous similar messages
      LNet: 11818:0:(o2iblnd_cb.c:1364:kiblnd_reconnect_peer()) Abort reconnection of 172.28.8.204@o2ib1: connected
      LNet: 11818:0:(o2iblnd_cb.c:1364:kiblnd_reconnect_peer()) Skipped 1 previous similar message
      LustreError: 11840:0:(niobuf.c:329:ptlrpc_register_bulk()) LBUG
      Pid: 11840, comm: ptlrpcd_01_01
      
      Call Trace:
       [<ffffffffa0967895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0967e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa0cae07c>] ptlrpc_register_bulk+0xfc/0x9c0 [ptlrpc]
       [<ffffffffa0985c74>] ? cfs_percpt_unlock+0x24/0xb0 [libcfs]
       [<ffffffffa0a1b7b4>] ? LNetMDUnlink+0xd4/0x160 [lnet]
       [<ffffffffa0cb5c64>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
       [<ffffffffa0caf5af>] ptl_send_rpc+0x1af/0xea0 [ptlrpc]
       [<ffffffffa0ce6804>] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
       [<ffffffffa0ca90b2>] ptlrpc_check_set+0x1462/0x1bf0 [ptlrpc]
       [<ffffffffa0cd6d83>] ptlrpcd_check+0x3d3/0x610 [ptlrpc]
       [<ffffffffa0cd7232>] ptlrpcd+0x272/0x4f0 [ptlrpc]
       [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
       [<ffffffffa0cd6fc0>] ? ptlrpcd+0x0/0x4f0 [ptlrpc]
       [<ffffffff810a640e>] kthread+0x9e/0xc0
       [<ffffffff8100c28a>] child_rip+0xa/0x20
       [<ffffffff810a6370>] ? kthread+0x0/0xc0
       [<ffffffff8100c280>] ? child_rip+0x0/0x20
      
      

       

      Attachments

        Issue Links

          Activity

            [LU-9828] LBUG ASSERTION( desc->bd_nob_transferred == 0 ) failed:
            pjones Peter Jones added a comment -

            Thanks askulysh. For future reference we can just update the commit message without losing positive testing and reviews so making these corrections does not require abandoning patches.

            pjones Peter Jones added a comment - Thanks askulysh . For future reference we can just update the commit message without losing positive testing and reviews so making these corrections does not require abandoning patches.

            Opened LU-10799

            askulysh Andriy Skulysh added a comment - Opened LU-10799
            pjones Peter Jones added a comment -

            Cory

            A new ticket linked to this one please. It causes no end of confusion when patches are tagged onto long-closed tickets.

            Peter

            pjones Peter Jones added a comment - Cory A new ticket linked to this one please. It causes no end of confusion when patches are tagged onto long-closed tickets. Peter
            spitzcor Cory Spitz added a comment -

            This issue is marked RESOLVED, yet https://review.whamcloud.com/#/c/30368 is still linked to here. Should we get a new ticket or should this issue be reopened?

            spitzcor Cory Spitz added a comment - This issue is marked RESOLVED, yet https://review.whamcloud.com/#/c/30368 is still linked to here. Should we get a new ticket or should this issue be reopened?

            Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/30368
            Subject: LU-9828 ptlrpc: ASSERTION(desc->bd_nob_transferred == 0)
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d4d5658f3cdad115c06a998e7fa91a2bd89e33dd

            gerrit Gerrit Updater added a comment - Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/30368 Subject: LU-9828 ptlrpc: ASSERTION(desc->bd_nob_transferred == 0) Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d4d5658f3cdad115c06a998e7fa91a2bd89e33dd

            The assertion failure can happen only during resend vs reply race. It is better to skip reply and restore the assertion. I'll commit the patch.

            askulysh Andriy Skulysh added a comment - The assertion failure can happen only during resend vs reply race. It is better to skip reply and restore the assertion. I'll commit the patch.

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28759/
            Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 39a275578e5d77d14f5b50b3c2a3fc924081e03c

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28759/ Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0 Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 39a275578e5d77d14f5b50b3c2a3fc924081e03c

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28759
            Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 7289ae9b0767ac65323bad97471b15f735154024

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28759 Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0 Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 7289ae9b0767ac65323bad97471b15f735154024
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28491/
            Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e6490ea6cf0b793c0b47f17ac5a5fa3a2a136e0d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28491/ Subject: LU-9828 ptlrpc: Do not assert when bd_nob_transferred != 0 Project: fs/lustre-release Branch: master Current Patch Set: Commit: e6490ea6cf0b793c0b47f17ac5a5fa3a2a136e0d

            People

              ashehata Amir Shehata (Inactive)
              mdiep Minh Diep
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: