Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.12.0
-
BG/Q I/O nodes
lustre-client-ion-2.5.4-4chaos_2.6.32_504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64.ppc64
-
3
-
9223372036854775807
Description
/bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request->rq_import->imp_state != LUSTRE_IMP_FULL || (request->rq_import->imp_msghdr_flags & 0x1) || ! (request->rq_import->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed:
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG
Call Trace:
show_stack
libcfs_debug_dumpstack
lbug_with_loc
ptl_send_rpc
ptlrpc_send_new_req
ptlrpc_set_wait
ll_statfs_internal
ll_statfs
statfs_by_dentry
vfs_statfs
user_statfs
SyS_statfs
syscall_exit
Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more. Continuing to occur.
We have not seen this issue before. The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April. We do not know what triggered this now.
c389652 LU-5528 ptlrpc: fix race between connect vs resend
There are no crash dumps for these nodes, nor much in the console logs.
Because several conditions were ASSERTed in a single statement, which failed is unknown.