Affects Version/s: Lustre 2.12.0
Environment:BG/Q I/O nodes
/bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request->rq_import->imp_state != LUSTRE_IMP_FULL || (request->rq_import->imp_msghdr_flags & 0x1) || ! (request->rq_import->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed:
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG
Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more. Continuing to occur.
We have not seen this issue before. The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April. We do not know what triggered this now.
LU-5528 ptlrpc: fix race between connect vs resend
There are no crash dumps for these nodes, nor much in the console logs.
Because several conditions were ASSERTed in a single statement, which failed is unknown.