[LU-5939] Error: trying to overwrite bigger transno - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.7.0
Labels:
- HB
Environment:
OpenSFS cluster running lustre-master tag 2.6.90 build #2745 with one MDS/MDT, three OSSs with two OSTs each and three clients.

Severity:
3
Rank (Obsolete):
16583

Description

I've been running sanity-hsm test 90 several time on this cluster and nearly every time I run the test, I see the following in dmesg on the MDS:

Lustre: DEBUG MARKER: == sanity-hsm test 90: Archive/restore a file list == 15:39:24 (1416440364)
Lustre: HSM agent bb8c2497-7403-4909-0e46-6614668e8ed7 already registered
LustreError: 26047:0:(mdt_coordinator.c:957:mdt_hsm_cdt_start()) scratch-MDT0000: Coordinator already started
LustreError: 19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818612, new: 25769818611 replay: 0. see LU-617.
LustreError: 19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) Skipped 5 previous similar messages
Lustre: DEBUG MARKER: == sanity-hsm test complete, duration 37 sec == 15:39:50 (1416440390)

From the kernel logs, I see:

...
00000001:00020000:9.0:1416440377.839622:0:19956:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818612, new: 25769818611 replay: 0. see LU-617.
...
00000001:00080000:8.0:1416440377.869378:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612
...
00000001:00080000:8.0:1416440377.869423:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612
...
00000001:00080000:8.0:1416440377.869508:0:30331:0:(tgt_lastrcvd.c:1231:tgt_txn_stop_cb()) More than one transaction 25769818612
...
00000100:00100000:8.0:1416440377.869685:0:30331:0:(service.c:2116:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_002:bb8c2497-7403-4909-0e46-6614668e8ed7+713:21533:x1485210712561904:12345-192.168.2.111@o2ib:57 Request procesed in 30116us (30167us total) trans 25769818612 rc 0/0

Similarly for other transaction numbers:

00000001:00020000:0.0:1416440378.133498:0:19955:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818617, new: 25769818614 replay: 0. see LU-617.

and

00000001:00020000:1.0F:1416440378.133518:0:31313:0:(tgt_lastrcvd.c:806:tgt_last_rcvd_update()) scratch-MDT0000: trying to overwrite bigger transno:on-disk: 25769818619, new: 25769818618 replay: 0. see LU-617.

Before running sanity-hsm test 90, the copytool was started on the agent, c11.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

hsm_log_1.txt
0.2 kB
20/Nov/14 12:32 AM

Issue Links

is related to

LU-6244 Handle multiple transactions produced by single request

Closed

is related to

LU-6223 HSM recovery needs more tests and fixes

Open

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-5939] Error: trying to overwrite bigger transno

Mikhail Pershin added a comment - 12/Feb/15 2:34 PM

Yes, I agree, that would be better

Mikhail Pershin added a comment - 12/Feb/15 2:34 PM Yes, I agree, that would be better

Jinshan Xiong (Inactive) added a comment - 12/Feb/15 4:57 AM

hmm I'm not comfortable with mutilple transactions can be made by HSM requests because I'm afraid it may have problems down the road. OUT can have multiple transaction for one RPC because it was carefully designed for this, but it can't be applied to HSM. I'd like to have an alternative way to fix this problem by limiting that HSM request can have only one trans per RPC.

Jinshan Xiong (Inactive) added a comment - 12/Feb/15 4:57 AM hmm I'm not comfortable with mutilple transactions can be made by HSM requests because I'm afraid it may have problems down the road. OUT can have multiple transaction for one RPC because it was carefully designed for this, but it can't be applied to HSM. I'd like to have an alternative way to fix this problem by limiting that HSM request can have only one trans per RPC.

Mikhail Pershin added a comment - 09/Feb/15 4:15 AM - edited

Note, the patch above makes just HSM requests repayable, it contains no tests because I found that HSM actions can't recover for reasons not related to this particular patch. I've create LU-6223 for HSM recovery testing.

So this particular patch should solve just problem with "overwrite bigger transno" message.

Mikhail Pershin added a comment - 09/Feb/15 4:15 AM - edited Note, the patch above makes just HSM requests repayable, it contains no tests because I found that HSM actions can't recover for reasons not related to this particular patch. I've create LU-6223 for HSM recovery testing. So this particular patch should solve just problem with "overwrite bigger transno" message.

Gerrit Updater added a comment - 09/Feb/15 3:51 AM

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13684
Subject: ~~LU-5939~~ hsm: make HSM modification requests replayable
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 726362d38cb95233acabacb3fe98ed484f0bff4e

Gerrit Updater added a comment - 09/Feb/15 3:51 AM Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13684 Subject: LU-5939 hsm: make HSM modification requests replayable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 726362d38cb95233acabacb3fe98ed484f0bff4e

Mikhail Pershin added a comment - 07/Feb/15 6:19 AM

Thanks for help, I identified first two as well, but not MDS_HSM_REQUEST, is it specific action which update data on disk or it does that upon any request?

Mikhail Pershin added a comment - 07/Feb/15 6:19 AM Thanks for help, I identified first two as well, but not MDS_HSM_REQUEST, is it specific action which update data on disk or it does that upon any request?

Aurelien Degremont (Inactive) added a comment - 06/Feb/15 9:22 AM

On my side, I've identified 3 RPCs which can update data on server

MDS_HSM_PROGRESS
MDS_HSM_STATE_SET
MDS_HSM_REQUEST

Serializing those requests can slow request ingestion rate but I think it is acceptable.

If you're working on a patch, please also add MUTABOR flag to MDS_HSM_REQUEST RPC.

Aurelien Degremont (Inactive) added a comment - 06/Feb/15 9:22 AM On my side, I've identified 3 RPCs which can update data on server MDS_HSM_PROGRESS MDS_HSM_STATE_SET MDS_HSM_REQUEST Serializing those requests can slow request ingestion rate but I think it is acceptable. If you're working on a patch, please also add MUTABOR flag to MDS_HSM_REQUEST RPC.

Mikhail Pershin added a comment - 06/Feb/15 4:51 AM

Yes, I am working on patch already, as for HSM agent performance, I am trying to serialize only requests that might update data on server

Mikhail Pershin added a comment - 06/Feb/15 4:51 AM Yes, I am working on patch already, as for HSM agent performance, I am trying to serialize only requests that might update data on server

Andreas Dilger added a comment - 04/Feb/15 7:24 PM

If the MDC is serializing change requests, is this going to hurt HSM performance, or is there only a single HSM request active on any client/agent node at once?

Adding the HSM opcodes to the recovery list seems like a simple and low-risk change, and serializing the HSM requests on the MDC is also only limited to HSM usage so doesn't seem too high risk.

Andreas Dilger added a comment - 04/Feb/15 7:24 PM If the MDC is serializing change requests, is this going to hurt HSM performance, or is there only a single HSM request active on any client/agent node at once? Adding the HSM opcodes to the recovery list seems like a simple and low-risk change, and serializing the HSM requests on the MDC is also only limited to HSM usage so doesn't seem too high risk.

Aurelien Degremont (Inactive) added a comment - 04/Feb/15 4:35 PM

I think this was forgotten. If somebody think this is bad, tell it, but IMO, we should replay them, and so, serialize them.

Aurelien Degremont (Inactive) added a comment - 04/Feb/15 4:35 PM I think this was forgotten. If somebody think this is bad, tell it, but IMO, we should replay them, and so, serialize them.

Mikhail Pershin added a comment - 29/Jan/15 11:51 PM - edited

Aurelien, I decided that HSM requests don't rely on Lustre recovery (replays) because HSM requests are filtered out during recovery, check tgt_filter_recovery_requests(), there are no any MDS_HSM_ opcode listed. This must be fixed if they are supposed to be replayed, also that means we should serialize HSM requests from mdc by mdc_rpc_lock(). This should solve also 'overwrite bigger transno' issue.

Mikhail Pershin added a comment - 29/Jan/15 11:51 PM - edited Aurelien, I decided that HSM requests don't rely on Lustre recovery (replays) because HSM requests are filtered out during recovery, check tgt_filter_recovery_requests(), there are no any MDS_HSM_ opcode listed. This must be fixed if they are supposed to be replayed, also that means we should serialize HSM requests from mdc by mdc_rpc_lock(). This should solve also 'overwrite bigger transno' issue.

Aurelien Degremont (Inactive) added a comment - 29/Jan/15 5:11 PM

Sorry, I did not see this issue.

HSM code is doing 2 kind of disk access:

It manages the coordinator llog. This stores all requests and especially their status. Each time the request is added, started, completed, the corresponding llog records are updated.
File HSM status is also updated (LMA EA is updated) when a request is started or finished.
IIRC, those 2 modifications are not in the same transaction.

Nothing in HSM code will resend the HSM request registrations. HSM (wrongly?) rely on Lustre recovery mechanism. HSM actions (archive this file, restore this one, etc...) are supposed to be kept between a MDT restarts.
I think those requests should be replayed and consider as any other RPCs.

Another thing that could lead to very big transaction is when sending an archive request for a big list of files. MDT will try to add them (one llog record per file) in one transaction. So far, this file list was limited by other Lustre limitation (LNET message size, KUC, ...). Don't know if this limitation changed but the test is still using only 51 files.

2. Don’t update last_rcvd and don’t assign transaction to such requests if they don’t need transaction and/or recovery by replaying them.

AFAIK, replay and recovery is tightly coupled. I do know how you can manage replay without recovery. Or doing this by hand? But is there a code somewhere in Lustre which does that?

Aurelien Degremont (Inactive) added a comment - 29/Jan/15 5:11 PM Sorry, I did not see this issue. HSM code is doing 2 kind of disk access: It manages the coordinator llog. This stores all requests and especially their status. Each time the request is added, started, completed, the corresponding llog records are updated. File HSM status is also updated (LMA EA is updated) when a request is started or finished. IIRC, those 2 modifications are not in the same transaction. Nothing in HSM code will resend the HSM request registrations. HSM (wrongly?) rely on Lustre recovery mechanism. HSM actions (archive this file, restore this one, etc...) are supposed to be kept between a MDT restarts. I think those requests should be replayed and consider as any other RPCs. Another thing that could lead to very big transaction is when sending an archive request for a big list of files. MDT will try to add them (one llog record per file) in one transaction. So far, this file list was limited by other Lustre limitation (LNET message size, KUC, ...). Don't know if this limitation changed but the test is still using only 51 files. 2. Don’t update last_rcvd and don’t assign transaction to such requests if they don’t need transaction and/or recovery by replaying them. AFAIK, replay and recovery is tightly coupled. I do know how you can manage replay without recovery. Or doing this by hand? But is there a code somewhere in Lustre which does that?

People

Assignee:: Mikhail Pershin

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 20/Nov/14 12:17 AM

Updated:: 01/Nov/15 5:12 PM

Resolved:: 24/May/15 12:51 PM