[LU-9296] ptlrpc_check_set()) @@@ bad phase ebc0de00 LBUG when OOM Created: 05/Apr/17 Updated: 16/May/17 Resolved: 01/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ptlrpc | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
To reproduce inject a -ENOMEM return from LNetMEAttach() in ptl_send_rpc(): diff --git a/lustre/ptlrpc/niobuf.c b/lustre/ptlrpc/niobuf.c
index e80d5b0..a4864a2 100644
--- a/lustre/ptlrpc/niobuf.c
+++ b/lustre/ptlrpc/niobuf.c
@@ -822,9 +822,12 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
request->rq_repmsg = NULL;
}
- rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
- connection->c_peer, request->rq_xid, 0,
- LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
+ if (OBD_FAIL_CHECK(0x9000))
+ rc = -ENOMEM;
+ else
+ rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
+ connection->c_peer, request->rq_xid, 0,
+ LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
if (rc != 0) {
CERROR("LNetMEAttach failed: %d\n", rc);
LASSERT (rc == -ENOMEM);
[ 144.795157] LustreError: 2790:0:(niobuf.c:832:ptl_send_rpc()) LNetMEAttach failed: -12 [ 144.797953] LustreError: 2790:0:(client.c:1740:ptlrpc_check_set()) @@@ bad phase ebc0de00 req@ff\ ff8801ee4dfc00 x1563854704784176/t0(0) o2->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 560/432 e 0 to \ 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 [ 144.805042] LustreError: 2790:0:(client.c:1741:ptlrpc_check_set()) LBUG [ 144.807415] Pid: 2790, comm: ptlrpcd_00_06 [ 144.808852] [ 144.808852] Call Trace: [ 144.810310] [<ffffffffa07b57f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [ 144.812804] [<ffffffffa07b5861>] lbug_with_loc+0x41/0xb0 [libcfs] [ 144.815261] [<ffffffffa0c6b725>] ptlrpc_check_set.part.22+0xa15/0x1dd0 [ptlrpc] [ 144.818524] [<ffffffffa0c6cb3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] [ 144.820082] [<ffffffffa0c98d0b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc] [ 144.822272] [<ffffffffa0c990bb>] ptlrpcd+0x2bb/0x560 [ptlrpc] [ 144.824208] [<ffffffff810b8940>] ? default_wake_function+0x0/0x20 [ 144.826392] [<ffffffffa0c98e00>] ? ptlrpcd+0x0/0x560 [ptlrpc] [ 144.829318] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [ 144.831181] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [ 144.832725] [<ffffffff81646b98>] ret_from_fork+0x58/0x90 [ 144.834325] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [ 144.836045] [ 144.836596] Kernel panic - not syncing: LBUG [ 144.837031] CPU: 0 PID: 2790 Comm: ptlrpcd_00_06 Tainted: P OE ------------ 3.10.0-327.36.1.el7_lustre.x86_64 #1 In this case ptl_send_rpc() is returning -ENOMEM without first setting rq_sent. But the logic ptlrpc_check_set() expects that rq_sent be set when -ENOMEM is returned. |
| Comments |
| Comment by Gerrit Updater [ 10/Apr/17 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/26470 |
| Comment by Joseph Gmitter (Inactive) [ 24/Apr/17 ] |
|
John already has a patch in flight. |
| Comment by Gerrit Updater [ 01/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26470/ |
| Comment by Peter Jones [ 01/May/17 ] |
|
Landed for 2.10 |