[LU-9828] LBUG ASSERTION( desc->bd_nob_transferred == 0 ) failed: Created: 04/Aug/17 Updated: 09/Mar/18 Resolved: 28/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Minh Diep | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
One of clients crashed due to the following LBUG. LustreError: 11818:0:(events.c:201:client_bulk_callback()) event type 2, status -103, desc ffff880827971600 LustreError: 11840:0:(niobuf.c:329:ptlrpc_register_bulk()) ASSERTION( desc->bd_nob_transferred == 0 ) failed: LustreError: 11818:0:(events.c:201:client_bulk_callback()) event type 2, status -103, desc ffff880d40623400 Lustre: yshare1-OST0023-osc-ffff882049a1c800: Connection to yshare1-OST0023 (at 172.28.8.204@o2ib1) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 21 previous similar messages LNet: 11818:0:(o2iblnd_cb.c:1364:kiblnd_reconnect_peer()) Abort reconnection of 172.28.8.204@o2ib1: connected LNet: 11818:0:(o2iblnd_cb.c:1364:kiblnd_reconnect_peer()) Skipped 1 previous similar message LustreError: 11840:0:(niobuf.c:329:ptlrpc_register_bulk()) LBUG Pid: 11840, comm: ptlrpcd_01_01 Call Trace: [<ffffffffa0967895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0967e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0cae07c>] ptlrpc_register_bulk+0xfc/0x9c0 [ptlrpc] [<ffffffffa0985c74>] ? cfs_percpt_unlock+0x24/0xb0 [libcfs] [<ffffffffa0a1b7b4>] ? LNetMDUnlink+0xd4/0x160 [lnet] [<ffffffffa0cb5c64>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [<ffffffffa0caf5af>] ptl_send_rpc+0x1af/0xea0 [ptlrpc] [<ffffffffa0ce6804>] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc] [<ffffffffa0ca90b2>] ptlrpc_check_set+0x1462/0x1bf0 [ptlrpc] [<ffffffffa0cd6d83>] ptlrpcd_check+0x3d3/0x610 [ptlrpc] [<ffffffffa0cd7232>] ptlrpcd+0x272/0x4f0 [ptlrpc] [<ffffffff8106c500>] ? default_wake_function+0x0/0x20 [<ffffffffa0cd6fc0>] ? ptlrpcd+0x0/0x4f0 [ptlrpc] [<ffffffff810a640e>] kthread+0x9e/0xc0 [<ffffffff8100c28a>] child_rip+0xa/0x20 [<ffffffff810a6370>] ? kthread+0x0/0xc0 [<ffffffff8100c280>] ? child_rip+0x0/0x20
|
| Comments |
| Comment by Gerrit Updater [ 11/Aug/17 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28491 |
| Comment by Oleg Drokin [ 11/Aug/17 ] |
|
I just hit this on my testbed as well |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28491/ |
| Comment by Peter Jones [ 28/Aug/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28759 |
| Comment by Gerrit Updater [ 14/Sep/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28759/ |
| Comment by Andriy Skulysh [ 05/Dec/17 ] |
|
The assertion failure can happen only during resend vs reply race. It is better to skip reply and restore the assertion. I'll commit the patch. |
| Comment by Gerrit Updater [ 05/Dec/17 ] |
|
Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/30368 |
| Comment by Cory Spitz [ 09/Mar/18 ] |
|
This issue is marked RESOLVED, yet https://review.whamcloud.com/#/c/30368 is still linked to here. Should we get a new ticket or should this issue be reopened? |
| Comment by Peter Jones [ 09/Mar/18 ] |
|
Cory A new ticket linked to this one please. It causes no end of confusion when patches are tagged onto long-closed tickets. Peter |
| Comment by Andriy Skulysh [ 09/Mar/18 ] |
|
Opened LU-10799 |
| Comment by Peter Jones [ 09/Mar/18 ] |
|
Thanks askulysh. For future reference we can just update the commit message without losing positive testing and reviews so making these corrections does not require abandoning patches. |