[LU-9296] ptlrpc_check_set()) @@@ bad phase ebc0de00 LBUG when OOM Created: 05/Apr/17  Updated: 16/May/17  Resolved: 01/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: ptlrpc

Issue Links:
Related
is related to LU-9414 LBUG and Hung on -ENOMEM in LNetMDAttach Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

To reproduce inject a -ENOMEM return from LNetMEAttach() in ptl_send_rpc():

diff --git a/lustre/ptlrpc/niobuf.c b/lustre/ptlrpc/niobuf.c
index e80d5b0..a4864a2 100644
--- a/lustre/ptlrpc/niobuf.c
+++ b/lustre/ptlrpc/niobuf.c
@@ -822,9 +822,12 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
                         request->rq_repmsg = NULL;
                 }
 
-                rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
-                                  connection->c_peer, request->rq_xid, 0,
-                                  LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
+               if (OBD_FAIL_CHECK(0x9000))
+                       rc = -ENOMEM;
+               else
+                       rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
+                                         connection->c_peer, request->rq_xid, 0,
+                                         LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
                 if (rc != 0) {
                         CERROR("LNetMEAttach failed: %d\n", rc);
                         LASSERT (rc == -ENOMEM);
[  144.795157] LustreError: 2790:0:(niobuf.c:832:ptl_send_rpc()) LNetMEAttach failed: -12
[  144.797953] LustreError: 2790:0:(client.c:1740:ptlrpc_check_set()) @@@ bad phase ebc0de00  req@ff\
ff8801ee4dfc00 x1563854704784176/t0(0) o2->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 560/432 e 0 to \
0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
[  144.805042] LustreError: 2790:0:(client.c:1741:ptlrpc_check_set()) LBUG
[  144.807415] Pid: 2790, comm: ptlrpcd_00_06
[  144.808852]
[  144.808852] Call Trace:
[  144.810310]  [<ffffffffa07b57f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[  144.812804]  [<ffffffffa07b5861>] lbug_with_loc+0x41/0xb0 [libcfs]
[  144.815261]  [<ffffffffa0c6b725>] ptlrpc_check_set.part.22+0xa15/0x1dd0 [ptlrpc]
[  144.818524]  [<ffffffffa0c6cb3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
[  144.820082]  [<ffffffffa0c98d0b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc]
[  144.822272]  [<ffffffffa0c990bb>] ptlrpcd+0x2bb/0x560 [ptlrpc]
[  144.824208]  [<ffffffff810b8940>] ? default_wake_function+0x0/0x20
[  144.826392]  [<ffffffffa0c98e00>] ? ptlrpcd+0x0/0x560 [ptlrpc]
[  144.829318]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
[  144.831181]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
[  144.832725]  [<ffffffff81646b98>] ret_from_fork+0x58/0x90
[  144.834325]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
[  144.836045]
[  144.836596] Kernel panic - not syncing: LBUG
[  144.837031] CPU: 0 PID: 2790 Comm: ptlrpcd_00_06 Tainted: P           OE  ------------   3.10.0-327.36.1.el7_lustre.x86_64 #1

In this case ptl_send_rpc() is returning -ENOMEM without first setting rq_sent. But the logic ptlrpc_check_set() expects that rq_sent be set when -ENOMEM is returned.



 Comments   
Comment by Gerrit Updater [ 10/Apr/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/26470
Subject: LU-9296 ptlrcp: set rq_sent when send fails due to ENOMEM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 89bfa76651b5d37cfe1f8ddd32df52d38b2ba175

Comment by Joseph Gmitter (Inactive) [ 24/Apr/17 ]

John already has a patch in flight.

Comment by Gerrit Updater [ 01/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26470/
Subject: LU-9296 ptlrpc: set rq_sent when send fails due to -ENOMEM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 32c3775bab8902e533fd153a357b46da12076933

Comment by Peter Jones [ 01/May/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:24:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.