Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.3
-
Lustre 2.4.2-14chaos (see github.com/chaos/lustre)
-
3
-
15744
Description
One of our MDS nodes crashed to day with the following assertion:
client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 548 > total measured time 165 osp_sync.c:355:osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed
Note that the two messages above were printed in the same second (as reported by syslog) and by the same kernel thread. I don't know if the ptlrpc_at_adj_net_latency() message is actually related to the assertion or not, but the proximity makes it worth noting.
There were a few OST to which the MDS lost and reestablished a connection a couple of minutes earlier in the log.
The backtrace was:
panic lbug_with_loc osp_sync_interpret ptlrpc_check_set ptlrpcd_check ptlrpcd kernel_thread
It was running lustre version 2.4.2-14chaos (see github.com/chaos/lustre).
We cannot provide logs or crash dumps for this machine.
Attachments
Issue Links
- is related to
-
LU-3892 osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed
- Resolved
-
LU-7453 osp_sync_interpret assertion
- Resolved
-
LU-9135 sanity test_313: osp_sync.c:571:osp_sync_interpret()) LBUG
- Resolved
-
LU-5193 2.6 DNE stress testing: osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed:
- Closed