Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11647

niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:

Details

    • 2
    • 9223372036854775807

    Description

      Getting clients crashing with:

      Dup of LU-8573? if so will need a backport for 2.10.5.

       [1541688893.800175] Lustre: 84667:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1541688893/real 1541688893]  req@ffff88062aa
      230c0 x1616517040059696/t0(0) o37->nbp8-MDT0000-mdc-ffff88080b0df800@10.151.27.60@o2ib:23/10 lens 568/440 e 0 to 1 dl 1541689235 ref 2 fl Rpc:ReX/0/ffffffff rc -11/-1
      [1541689099.557409] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: 
      [1541689099.569408] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG
      [1541689099.581408]  [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0
      [1541689099.589408]  [<ffffffff8101ad54>] dump_trace+0x64/0x3b0
      [1541689099.593408]  [<ffffffff81027b02>] save_stack_trace_tsk+0x22/0x40
      [1541689099.601407]  [<ffffffffa095c70d>] libcfs_call_trace+0x7d/0xa0 [libcfs]
      [1541689099.609407]  [<ffffffffa095c7a5>] lbug_with_loc+0x45/0x90 [libcfs]
      [1541689099.613407]  [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc]
      [1541689099.621407]  [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc]
      [1541689099.629406]  [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc]
      [1541689099.637406]  [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc]
      [1541689099.645406]  [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc]
      [1541689099.649406]  [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [1541689099.657406]  [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc]
      [1541689099.665405]  [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc]
      [1541689099.673405]  [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0
      [1541689099.677405]  [<ffffffff811a48f9>] read_cache_page+0x19/0x20
      [1541689099.685405]  [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc]
      [1541689099.689404]  [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv]
      [1541689099.697404]  [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre]
      [1541689099.705404]  [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre]
      [1541689099.709404]  [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre]
      [1541689099.717404]  [<ffffffff812342b0>] iterate_dir+0xa0/0x120
      [1541689099.721403]  [<ffffffff812346f3>] SyS_getdents+0x83/0xf0
      [1541689099.729403]  [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca
      [1541689099.733403]  [<ffffffffffffffff>] 0xffffffffffffffff
      [1541689099.741403] Kernel panic - not syncing: LBUG
      [1541689099.745403] CPU: 1 PID: 3447 Comm: csh Tainted: G           OE   NX 4.4.143-94.47.1.20180815-nasa #1
      [1541689099.753402] Hardware name: SGI.COM C1104-RP7/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.00 09/12/2013
      [1541689099.765402]  0000000000000000 ffff88036753f6d0 ffffffff8134907c ffffffffa0979e4b
      [1541689099.773402]  ffff8810440f1008 ffff88036753f748 ffffffff811a111a ffffffff00000008
      [1541689099.777402]  ffff88036753f758 ffff88036753f6f8 ffffffff810fcea5 0000000000000282
      [1541689099.785401] Call Trace:
      [1541689099.789401]  [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0
      [1541689099.797401]  [<ffffffff8101ad54>] dump_trace+0x64/0x3b0
      [1541689099.801401]  [<ffffffff8101bf9d>] show_trace_log_lvl+0x4d/0x60
      [1541689099.809401]  [<ffffffff8101b18a>] show_stack_log_lvl+0xea/0x170
      [1541689099.813400]  [<ffffffff8101bff5>] show_stack+0x25/0x50
      [1541689099.821400]  [<ffffffff8134907c>] dump_stack+0x63/0x87
      [1541689099.825400]  [<ffffffff811a111a>] panic+0xd2/0x232
      [1541689099.829400]  [<ffffffffa095c7ee>] lbug_with_loc+0x8e/0x90 [libcfs]
      [1541689099.837400]  [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc]
      [1541689099.845399]  [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc]
      [1541689099.853399]  [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc]
      [1541689099.861399]  [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc]
      [1541689099.865399]  [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc]
      [1541689099.873398]  [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [1541689099.881398]  [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc]
      [1541689099.885398]  [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc]
      [1541689099.893398]  [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0
      [1541689099.901398]  [<ffffffff811a48f9>] read_cache_page+0x19/0x20
      [1541689099.905397]  [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc]
      [1541689099.913397]  [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv]
      [1541689099.917397]  [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre]
      [1541689099.925397]  [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre]
      [1541689099.933396]  [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre]
      [1541689099.937396]  [<ffffffff812342b0>] iterate_dir+0xa0/0x120
      [1541689099.945396]  [<ffffffff812346f3>] SyS_getdents+0x83/0xf0
      [1541689099.949396]  [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca
      

      Attachments

        Issue Links

          Activity

            [LU-11647] niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:

            Patch v9 is looking promising, though it is still undergoing review.

            adilger Andreas Dilger added a comment - Patch v9 is looking promising, though it is still undergoing review.

            As of this morning, the patchset #6 failed to pass all autotests.

            jaylan Jay Lan (Inactive) added a comment - As of this morning, the patchset #6 failed to pass all autotests.
            pjones Peter Jones added a comment -

            Mahmoud

            Usually we would want to wait until the fix has finalized (i.e landed to master) before backporting. Is this issue disruptive enough that you would want to run the risk of the fix changing due to testing/review feedback?

            Peter

            pjones Peter Jones added a comment - Mahmoud Usually we would want to wait until the fix has finalized (i.e landed to master) before backporting. Is this issue disruptive enough that you would want to run the risk of the fix changing due to testing/review feedback? Peter

            Can we get a 2.10.5 back port please.

            mhanafi Mahmoud Hanafi added a comment - Can we get a 2.10.5 back port please.

            Yes, It should be a duplicate of LU-8573

            hongchao.zhang Hongchao Zhang added a comment - Yes, It should be a duplicate of LU-8573
            pjones Peter Jones added a comment -

            Hongchao

            Does this seem related to LU-8573 to you?

            Peter

            pjones Peter Jones added a comment - Hongchao Does this seem related to LU-8573 to you? Peter

            People

              hongchao.zhang Hongchao Zhang
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: