Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.1.0
-
None
-
3
-
6411
Description
Back trace looks like this:
machine_kexec
crash_kexec
oops_end
die
do_general_protection
general_protection
[exception RIP: ptlrpc_send_replay+1136]
ptlrpc_send_error
target_send_replay_msg
target_send_reply
ost_handle
ptlrpc_main
kernel_thread
That RIP resolves to lustre/ptlrpc/niobuf.c:436 which in our tree is here:
434 /* There may be no rq_export during failover */ 435 436 if (unlikely(req->rq_export && req->rq_export->exp_obd && 437 req->rq_export->exp_obd->obd_fail)) { 438 /* Failed obd's only send ENODEV */ 439 req->rq_type = PTL_RPC_MSG_ERR; 440 req->rq_status = -ENODEV; 441 CDEBUG(D_HA, "sending ENODEV from failed obd %d\n", 442 req->rq_export->exp_obd->obd_minor); 443 }
Server was handling many client reconnects, under similar conditions as reported in LU-1085, LU-1092, LU-1093, and LU-1094.