Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11647

niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:

Details

    • 2
    • 9223372036854775807

    Description

      Getting clients crashing with:

      Dup of LU-8573? if so will need a backport for 2.10.5.

       [1541688893.800175] Lustre: 84667:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1541688893/real 1541688893]  req@ffff88062aa
      230c0 x1616517040059696/t0(0) o37->nbp8-MDT0000-mdc-ffff88080b0df800@10.151.27.60@o2ib:23/10 lens 568/440 e 0 to 1 dl 1541689235 ref 2 fl Rpc:ReX/0/ffffffff rc -11/-1
      [1541689099.557409] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: 
      [1541689099.569408] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG
      [1541689099.581408]  [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0
      [1541689099.589408]  [<ffffffff8101ad54>] dump_trace+0x64/0x3b0
      [1541689099.593408]  [<ffffffff81027b02>] save_stack_trace_tsk+0x22/0x40
      [1541689099.601407]  [<ffffffffa095c70d>] libcfs_call_trace+0x7d/0xa0 [libcfs]
      [1541689099.609407]  [<ffffffffa095c7a5>] lbug_with_loc+0x45/0x90 [libcfs]
      [1541689099.613407]  [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc]
      [1541689099.621407]  [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc]
      [1541689099.629406]  [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc]
      [1541689099.637406]  [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc]
      [1541689099.645406]  [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc]
      [1541689099.649406]  [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [1541689099.657406]  [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc]
      [1541689099.665405]  [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc]
      [1541689099.673405]  [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0
      [1541689099.677405]  [<ffffffff811a48f9>] read_cache_page+0x19/0x20
      [1541689099.685405]  [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc]
      [1541689099.689404]  [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv]
      [1541689099.697404]  [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre]
      [1541689099.705404]  [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre]
      [1541689099.709404]  [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre]
      [1541689099.717404]  [<ffffffff812342b0>] iterate_dir+0xa0/0x120
      [1541689099.721403]  [<ffffffff812346f3>] SyS_getdents+0x83/0xf0
      [1541689099.729403]  [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca
      [1541689099.733403]  [<ffffffffffffffff>] 0xffffffffffffffff
      [1541689099.741403] Kernel panic - not syncing: LBUG
      [1541689099.745403] CPU: 1 PID: 3447 Comm: csh Tainted: G           OE   NX 4.4.143-94.47.1.20180815-nasa #1
      [1541689099.753402] Hardware name: SGI.COM C1104-RP7/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.00 09/12/2013
      [1541689099.765402]  0000000000000000 ffff88036753f6d0 ffffffff8134907c ffffffffa0979e4b
      [1541689099.773402]  ffff8810440f1008 ffff88036753f748 ffffffff811a111a ffffffff00000008
      [1541689099.777402]  ffff88036753f758 ffff88036753f6f8 ffffffff810fcea5 0000000000000282
      [1541689099.785401] Call Trace:
      [1541689099.789401]  [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0
      [1541689099.797401]  [<ffffffff8101ad54>] dump_trace+0x64/0x3b0
      [1541689099.801401]  [<ffffffff8101bf9d>] show_trace_log_lvl+0x4d/0x60
      [1541689099.809401]  [<ffffffff8101b18a>] show_stack_log_lvl+0xea/0x170
      [1541689099.813400]  [<ffffffff8101bff5>] show_stack+0x25/0x50
      [1541689099.821400]  [<ffffffff8134907c>] dump_stack+0x63/0x87
      [1541689099.825400]  [<ffffffff811a111a>] panic+0xd2/0x232
      [1541689099.829400]  [<ffffffffa095c7ee>] lbug_with_loc+0x8e/0x90 [libcfs]
      [1541689099.837400]  [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc]
      [1541689099.845399]  [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc]
      [1541689099.853399]  [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc]
      [1541689099.861399]  [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc]
      [1541689099.865399]  [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc]
      [1541689099.873398]  [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [1541689099.881398]  [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc]
      [1541689099.885398]  [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc]
      [1541689099.893398]  [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0
      [1541689099.901398]  [<ffffffff811a48f9>] read_cache_page+0x19/0x20
      [1541689099.905397]  [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc]
      [1541689099.913397]  [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv]
      [1541689099.917397]  [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre]
      [1541689099.925397]  [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre]
      [1541689099.933396]  [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre]
      [1541689099.937396]  [<ffffffff812342b0>] iterate_dir+0xa0/0x120
      [1541689099.945396]  [<ffffffff812346f3>] SyS_getdents+0x83/0xf0
      [1541689099.949396]  [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca
      

      Attachments

        Issue Links

          Activity

            [LU-11647] niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:
            pjones Peter Jones added a comment -

            Jay

            Yes I would expect that fix to land to b2_10 in this coming week

            Peter

            pjones Peter Jones added a comment - Jay Yes I would expect that fix to land to b2_10 in this coming week Peter

            Can I assume the work at #33798 is also done, since #22378 has been merged?

            jaylan Jay Lan (Inactive) added a comment - Can I assume the work at #33798 is also done, since #22378 has been merged?
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/22378/
            Subject: LU-11647 ptlrpc: always unregister bulk
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 21c53b18a1bc0e36d2ecd1fb731f0dc6403902ee

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/22378/ Subject: LU-11647 ptlrpc: always unregister bulk Project: fs/lustre-release Branch: master Current Patch Set: Commit: 21c53b18a1bc0e36d2ecd1fb731f0dc6403902ee

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33798
            Subject: LU-11647 ptlrpc: always unregister bulk
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: bd41c38752dda7e843c1bfb405f2214a31f74366

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33798 Subject: LU-11647 ptlrpc: always unregister bulk Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: bd41c38752dda7e843c1bfb405f2214a31f74366

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/22378
            Subject: LU-11647 ptlrpc: always unregister bulk
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 5
            Commit: e34a4cf031a2b83259cee8e05c2f646b5652b6a9

            adilger Andreas Dilger added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/22378 Subject: LU-11647 ptlrpc: always unregister bulk Project: fs/lustre-release Branch: master Current Patch Set: 5 Commit: e34a4cf031a2b83259cee8e05c2f646b5652b6a9
            adilger Andreas Dilger added a comment - - edited

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: http://review.whamcloud.com/33167
            Subject: LU-11647 ptlrpc: race with reply_in_callback
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 10
            Commit: 29d2c7ad100098497631c2ce172dc0e03accde60

            adilger Andreas Dilger added a comment - - edited Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: http://review.whamcloud.com/33167 Subject: LU-11647 ptlrpc: race with reply_in_callback Project: fs/lustre-release Branch: master Current Patch Set: 10 Commit: 29d2c7ad100098497631c2ce172dc0e03accde60

            Patch v9 is looking promising, though it is still undergoing review.

            adilger Andreas Dilger added a comment - Patch v9 is looking promising, though it is still undergoing review.

            As of this morning, the patchset #6 failed to pass all autotests.

            jaylan Jay Lan (Inactive) added a comment - As of this morning, the patchset #6 failed to pass all autotests.
            pjones Peter Jones added a comment -

            Mahmoud

            Usually we would want to wait until the fix has finalized (i.e landed to master) before backporting. Is this issue disruptive enough that you would want to run the risk of the fix changing due to testing/review feedback?

            Peter

            pjones Peter Jones added a comment - Mahmoud Usually we would want to wait until the fix has finalized (i.e landed to master) before backporting. Is this issue disruptive enough that you would want to run the risk of the fix changing due to testing/review feedback? Peter

            Can we get a 2.10.5 back port please.

            mhanafi Mahmoud Hanafi added a comment - Can we get a 2.10.5 back port please.

            People

              hongchao.zhang Hongchao Zhang
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: