Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
OpenSFS cluster running lustre-master tag 2.6.90 build #2745 with one MDS/MDT, three OSSs with two OSTs each and three clients.
-
3
-
16583
Description
I've been running sanity-hsm test 90 several time on this cluster and nearly every time I run the test, I see the following in dmesg on the MDS:
Lustre: DEBUG MARKER: == sanity-hsm test 90: Archive/restore a file list == 15:39:24 (1416440364) Lustre: HSM agent bb8c2497-7403-4909-0e46-6614668e8ed7 already registered LustreError: 26047:0:(mdt_coordinator.c:957:mdt_hsm_cdt_start()) scratch-MDT0000: Coordinator already started LustreError: 19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818612, new: 25769818611 replay: 0. see LU-617. LustreError: 19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) Skipped 5 previous similar messages Lustre: DEBUG MARKER: == sanity-hsm test complete, duration 37 sec == 15:39:50 (1416440390)
From the kernel logs, I see:
... 00000001:00020000:9.0:1416440377.839622:0:19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818612, new: 25769818611 replay: 0. see LU-617. ... 00000001:00080000:8.0:1416440377.869378:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612 ... 00000001:00080000:8.0:1416440377.869423:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612 ... 00000001:00080000:8.0:1416440377.869508:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612 ... 00000100:00100000:8.0:1416440377.869685:0:30331:0:(service.c:2116:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_002:bb8c2497-7403-4909-0e46-6614668e8ed7+713:21533:x1485210712561904:12345-192.168.2.111@o2ib:57 Request procesed in 30116us (30167us total) trans 25769818612 rc 0/0
Similarly for other transaction numbers:
00000001:00020000:0.0:1416440378.133498:0:19955:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818617, new: 25769818614 replay: 0. see LU-617.
and
00000001:00020000:1.0F:1416440378.133518:0:31313:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818619, new: 25769818618 replay: 0. see LU-617.
Before running sanity-hsm test 90, the copytool was started on the agent, c11.