Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.3
-
Lustre 2.4.2-14chaos (see github.com/chaos/lustre)
-
3
-
15744
Description
One of our MDS nodes crashed to day with the following assertion:
client.c:304:ptlrpc_at_adj_net_latency()) Reported service time 548 > total measured time 165 osp_sync.c:355:osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed
Note that the two messages above were printed in the same second (as reported by syslog) and by the same kernel thread. I don't know if the ptlrpc_at_adj_net_latency() message is actually related to the assertion or not, but the proximity makes it worth noting.
There were a few OST to which the MDS lost and reestablished a connection a couple of minutes earlier in the log.
The backtrace was:
panic lbug_with_loc osp_sync_interpret ptlrpc_check_set ptlrpcd_check ptlrpcd kernel_thread
It was running lustre version 2.4.2-14chaos (see github.com/chaos/lustre).
We cannot provide logs or crash dumps for this machine.
Attachments
Issue Links
- is related to
-
LU-3892 osp_sync.c:356:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 ) failed
-
- Resolved
-
-
LU-7453 osp_sync_interpret assertion
-
- Resolved
-
-
LU-9135 sanity test_313: osp_sync.c:571:osp_sync_interpret()) LBUG
-
- Resolved
-
-
LU-5193 2.6 DNE stress testing: osp_sync_interpret()) ASSERTION( rc || req->rq_transno ) failed:
-
- Closed
-
Attached server syslogs, the stack dumps were not captured unfortunately, but the location for that collection is mounted now in case it happens again.
There had been network issues earlier in the day, reportedly resolved by 4pm.
fyi the number of clients on the fs is currently 6395. And the exact version of the software is
lustre: 2.5.5
kernel: patchless_client
build: -6chaos-CHANGED-2.6.32-573.26.1.1chaos.ch5.4.x86_64