Details

    • 3
    • 6411

    Description

      Back trace looks like this:

      machine_kexec
      crash_kexec
      oops_end
      die
      do_general_protection
      general_protection
      [exception RIP: ptlrpc_send_replay+1136]
      ptlrpc_send_error
      target_send_replay_msg
      target_send_reply
      ost_handle
      ptlrpc_main
      kernel_thread

      That RIP resolves to lustre/ptlrpc/niobuf.c:436 which in our tree is here:

      434         /* There may be no rq_export during failover */
      435 
      436         if (unlikely(req->rq_export && req->rq_export->exp_obd &&
      437                      req->rq_export->exp_obd->obd_fail)) { 
      438                 /* Failed obd's only send ENODEV */
      439                 req->rq_type = PTL_RPC_MSG_ERR;
      440                 req->rq_status = -ENODEV;
      441                 CDEBUG(D_HA, "sending ENODEV from failed obd %d\n",
      442                        req->rq_export->exp_obd->obd_minor);
      443         }
      

      Server was handling many client reconnects, under similar conditions as reported in LU-1085, LU-1092, LU-1093, and LU-1094.

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: