Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9296

ptlrpc_check_set()) @@@ bad phase ebc0de00 LBUG when OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0
    • 3
    • 9223372036854775807

    Description

      To reproduce inject a -ENOMEM return from LNetMEAttach() in ptl_send_rpc():

      diff --git a/lustre/ptlrpc/niobuf.c b/lustre/ptlrpc/niobuf.c
      index e80d5b0..a4864a2 100644
      --- a/lustre/ptlrpc/niobuf.c
      +++ b/lustre/ptlrpc/niobuf.c
      @@ -822,9 +822,12 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
                               request->rq_repmsg = NULL;
                       }
       
      -                rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
      -                                  connection->c_peer, request->rq_xid, 0,
      -                                  LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
      +               if (OBD_FAIL_CHECK(0x9000))
      +                       rc = -ENOMEM;
      +               else
      +                       rc = LNetMEAttach(request->rq_reply_portal,/*XXX FIXME bug 249*/
      +                                         connection->c_peer, request->rq_xid, 0,
      +                                         LNET_UNLINK, LNET_INS_AFTER, &reply_me_h);
                       if (rc != 0) {
                               CERROR("LNetMEAttach failed: %d\n", rc);
                               LASSERT (rc == -ENOMEM);
      
      [  144.795157] LustreError: 2790:0:(niobuf.c:832:ptl_send_rpc()) LNetMEAttach failed: -12
      [  144.797953] LustreError: 2790:0:(client.c:1740:ptlrpc_check_set()) @@@ bad phase ebc0de00  req@ff\
      ff8801ee4dfc00 x1563854704784176/t0(0) o2->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 560/432 e 0 to \
      0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
      [  144.805042] LustreError: 2790:0:(client.c:1741:ptlrpc_check_set()) LBUG
      [  144.807415] Pid: 2790, comm: ptlrpcd_00_06
      [  144.808852]
      [  144.808852] Call Trace:
      [  144.810310]  [<ffffffffa07b57f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      [  144.812804]  [<ffffffffa07b5861>] lbug_with_loc+0x41/0xb0 [libcfs]
      [  144.815261]  [<ffffffffa0c6b725>] ptlrpc_check_set.part.22+0xa15/0x1dd0 [ptlrpc]
      [  144.818524]  [<ffffffffa0c6cb3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      [  144.820082]  [<ffffffffa0c98d0b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc]
      [  144.822272]  [<ffffffffa0c990bb>] ptlrpcd+0x2bb/0x560 [ptlrpc]
      [  144.824208]  [<ffffffff810b8940>] ? default_wake_function+0x0/0x20
      [  144.826392]  [<ffffffffa0c98e00>] ? ptlrpcd+0x0/0x560 [ptlrpc]
      [  144.829318]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
      [  144.831181]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      [  144.832725]  [<ffffffff81646b98>] ret_from_fork+0x58/0x90
      [  144.834325]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      [  144.836045]
      [  144.836596] Kernel panic - not syncing: LBUG
      [  144.837031] CPU: 0 PID: 2790 Comm: ptlrpcd_00_06 Tainted: P           OE  ------------   3.10.0-327.36.1.el7_lustre.x86_64 #1
      

      In this case ptl_send_rpc() is returning -ENOMEM without first setting rq_sent. But the logic ptlrpc_check_set() expects that rq_sent be set when -ENOMEM is returned.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: