[LU-11647] niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: Created: 09/Nov/18 Updated: 19/Mar/19 Resolved: 15/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.10.5 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 2 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Getting clients crashing with: Dup of [1541688893.800175] Lustre: 84667:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1541688893/real 1541688893] req@ffff88062aa 230c0 x1616517040059696/t0(0) o37->nbp8-MDT0000-mdc-ffff88080b0df800@10.151.27.60@o2ib:23/10 lens 568/440 e 0 to 1 dl 1541689235 ref 2 fl Rpc:ReX/0/ffffffff rc -11/-1 [1541689099.557409] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed: [1541689099.569408] LustreError: 3447:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG [1541689099.581408] [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0 [1541689099.589408] [<ffffffff8101ad54>] dump_trace+0x64/0x3b0 [1541689099.593408] [<ffffffff81027b02>] save_stack_trace_tsk+0x22/0x40 [1541689099.601407] [<ffffffffa095c70d>] libcfs_call_trace+0x7d/0xa0 [libcfs] [1541689099.609407] [<ffffffffa095c7a5>] lbug_with_loc+0x45/0x90 [libcfs] [1541689099.613407] [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc] [1541689099.621407] [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc] [1541689099.629406] [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc] [1541689099.637406] [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc] [1541689099.645406] [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc] [1541689099.649406] [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] [1541689099.657406] [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc] [1541689099.665405] [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc] [1541689099.673405] [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0 [1541689099.677405] [<ffffffff811a48f9>] read_cache_page+0x19/0x20 [1541689099.685405] [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc] [1541689099.689404] [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv] [1541689099.697404] [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre] [1541689099.705404] [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre] [1541689099.709404] [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre] [1541689099.717404] [<ffffffff812342b0>] iterate_dir+0xa0/0x120 [1541689099.721403] [<ffffffff812346f3>] SyS_getdents+0x83/0xf0 [1541689099.729403] [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca [1541689099.733403] [<ffffffffffffffff>] 0xffffffffffffffff [1541689099.741403] Kernel panic - not syncing: LBUG [1541689099.745403] CPU: 1 PID: 3447 Comm: csh Tainted: G OE NX 4.4.143-94.47.1.20180815-nasa #1 [1541689099.753402] Hardware name: SGI.COM C1104-RP7/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 3.00 09/12/2013 [1541689099.765402] 0000000000000000 ffff88036753f6d0 ffffffff8134907c ffffffffa0979e4b [1541689099.773402] ffff8810440f1008 ffff88036753f748 ffffffff811a111a ffffffff00000008 [1541689099.777402] ffff88036753f758 ffff88036753f6f8 ffffffff810fcea5 0000000000000282 [1541689099.785401] Call Trace: [1541689099.789401] [<ffffffff8101bf34>] try_stack_unwind+0x194/0x1b0 [1541689099.797401] [<ffffffff8101ad54>] dump_trace+0x64/0x3b0 [1541689099.801401] [<ffffffff8101bf9d>] show_trace_log_lvl+0x4d/0x60 [1541689099.809401] [<ffffffff8101b18a>] show_stack_log_lvl+0xea/0x170 [1541689099.813400] [<ffffffff8101bff5>] show_stack+0x25/0x50 [1541689099.821400] [<ffffffff8134907c>] dump_stack+0x63/0x87 [1541689099.825400] [<ffffffff811a111a>] panic+0xd2/0x232 [1541689099.829400] [<ffffffffa095c7ee>] lbug_with_loc+0x8e/0x90 [libcfs] [1541689099.837400] [<ffffffffa0ac84b9>] ptlrpc_register_bulk+0x7a9/0x970 [ptlrpc] [1541689099.845399] [<ffffffffa0ac8fe5>] ptl_send_rpc+0x225/0xdf0 [ptlrpc] [1541689099.853399] [<ffffffffa0ac328e>] ptlrpc_check_set.part.23+0x178e/0x1d60 [ptlrpc] [1541689099.861399] [<ffffffffa0ac38af>] ptlrpc_check_set+0x4f/0xd0 [ptlrpc] [1541689099.865399] [<ffffffffa0ac3b3a>] ptlrpc_set_wait+0x20a/0x890 [ptlrpc] [1541689099.873398] [<ffffffffa0ac423d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] [1541689099.881398] [<ffffffffa084fdcc>] mdc_getpage+0x1bc/0x620 [mdc] [1541689099.885398] [<ffffffffa085034b>] mdc_read_page_remote+0x11b/0x5e0 [mdc] [1541689099.893398] [<ffffffff811a482f>] do_read_cache_page+0xff/0x1b0 [1541689099.901398] [<ffffffff811a48f9>] read_cache_page+0x19/0x20 [1541689099.905397] [<ffffffffa084daba>] mdc_read_page+0x1aa/0x9e0 [mdc] [1541689099.913397] [<ffffffffa0a57e63>] lmv_read_page+0x1a3/0x510 [lmv] [1541689099.917397] [<ffffffffa0ccf35b>] ll_get_dir_page+0xbb/0x330 [lustre] [1541689099.925397] [<ffffffffa0ccf704>] ll_dir_read+0x94/0x2e0 [lustre] [1541689099.933396] [<ffffffffa0ccfa58>] ll_iterate+0x108/0x520 [lustre] [1541689099.937396] [<ffffffff812342b0>] iterate_dir+0xa0/0x120 [1541689099.945396] [<ffffffff812346f3>] SyS_getdents+0x83/0xf0 [1541689099.949396] [<ffffffff81651e43>] entry_SYSCALL_64_fastpath+0x1e/0xca |
| Comments |
| Comment by Peter Jones [ 09/Nov/18 ] |
|
Hongchao Does this seem related to Peter |
| Comment by Hongchao Zhang [ 09/Nov/18 ] |
|
Yes, It should be a duplicate of |
| Comment by Mahmoud Hanafi [ 09/Nov/18 ] |
|
Can we get a 2.10.5 back port please. |
| Comment by Peter Jones [ 09/Nov/18 ] |
|
Mahmoud Usually we would want to wait until the fix has finalized (i.e landed to master) before backporting. Is this issue disruptive enough that you would want to run the risk of the fix changing due to testing/review feedback? Peter |
| Comment by Jay Lan (Inactive) [ 09/Nov/18 ] |
|
As of this morning, the patchset #6 failed to pass all autotests. |
| Comment by Andreas Dilger [ 26/Nov/18 ] |
|
Patch v9 is looking promising, though it is still undergoing review. |
| Comment by Andreas Dilger [ 01/Dec/18 ] |
|
Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: http://review.whamcloud.com/33167 |
| Comment by Andreas Dilger [ 01/Dec/18 ] |
|
Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/22378 |
| Comment by Gerrit Updater [ 06/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33798 |
| Comment by Gerrit Updater [ 16/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/22378/ |
| Comment by Peter Jones [ 16/Jan/19 ] |
|
Landed for 2.13 |
| Comment by Jay Lan (Inactive) [ 25/Jan/19 ] |
|
Can I assume the work at #33798 is also done, since #22378 has been merged? |
| Comment by Peter Jones [ 27/Jan/19 ] |
|
Jay Yes I would expect that fix to land to b2_10 in this coming week Peter |
| Comment by Gerrit Updater [ 15/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33798/ |
| Comment by Gerrit Updater [ 25/Feb/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34305 |
| Comment by Gerrit Updater [ 19/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34305/ |