Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.3
-
Lustre 2.4.2-14chaos (see github.com/chaos/lustre)
-
3
-
15744
Description
One of our MDS nodes crashed to day with the following assertion:
client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 548 > total measured time 165 osp_sync.c:355:osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed
Note that the two messages above were printed in the same second (as reported by syslog) and by the same kernel thread. I don't know if the ptlrpc_at_adj_net_latency() message is actually related to the assertion or not, but the proximity makes it worth noting.
There were a few OST to which the MDS lost and reestablished a connection a couple of minutes earlier in the log.
The backtrace was:
panic lbug_with_loc osp_sync_interpret ptlrpc_check_set ptlrpcd_check ptlrpcd kernel_thread
It was running lustre version 2.4.2-14chaos (see github.com/chaos/lustre).
We cannot provide logs or crash dumps for this machine.
Attachments
Issue Links
- is related to
-
LU-3892 osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed
-
- Resolved
-
-
LU-7453 osp_sync_interpret assertion
-
- Resolved
-
-
LU-9135 sanity test_313: osp_sync.c:571:osp_sync_interpret()) LBUG
-
- Resolved
-
-
LU-5193 2.6 DNE stress testing: osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed:
-
- Closed
-
Activity
Resolution | New: Duplicate [ 3 ] | |
Status | Original: Reopened [ 4 ] | New: Resolved [ 5 ] |
Labels | Original: MB llnl | New: llnl |
Attachment | New: lbugmay2.zip [ 26607 ] |
Attachment | New: LU-5629-syslog.bz2 [ 22169 ] |
End date | New: 23/May/16 | |
Start date | New: 16/Sep/14 |
Assignee | Original: WC Triage [ wc-triage ] | New: Dmitry Eremin [ dmiter ] |