[LU-15847] Resolve 'More than one transno' situations Created: 12/May/22  Updated: 10/Nov/22

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15776 2.15 RC3: lost writes during server f... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Message 'More than one transno' in lustre debug log means that during some RPC request handling we do several transaction start/stop cycles. This is potentially dangerous situation, because without proper handling it may cause missing transactions after all. E.g. if first transaction was applied and its transno is written in last_rcvd then second will not. Now consider that only first one was really committed, then its transaction number is considered as committed too and there will be no replay for that request leaving the second part uncommitted and not-replayed. So every message 'More that one transno' means potentia; data loss. At the same time this message is not masked as D_ERROR to be visible in console.

Meanwhile we have limited way to cover multi-transaction RPC handling, if we know that may happen we store in last_rcvd only the latest transno and all previous transactions must be reentrant, e.g. can be applied without errors being already applied before. In that case replay will not cause errors in case they were committed but the last one wasn't. For such cases flag tti_mult_trans is being used to mark RPC handler as legally multi-transactional.

I propose to change default behavior for multi-transation situation to always write the latest transno in last_rcvd file like we do with tti_mult_trans flag and issue warning message if that was not expected. So non-handled cases would be reviewed and resolved properly. Bad effect of that would be replay errors for non-handled cases but that is better than possible data loss.



 Comments   
Comment by Mikhail Pershin [ 12/May/22 ]

In general, let's consider two transactions per RPC case and that we store always the latest transno in the last_rcvd file. There are several way to handle  the first transaction:

  • consider if everything could still be done inside single transaction 
  • make the first part reentrant as possible, i.e. be applicable all the time no matter of previous state. An example could be setattr or write
  • tolerate errors for first transaction during replay, e.g. EEXIST if first transno was committed. E.g cases like setxattr might need flags to be reset to 0 during replay.

Also considering that first transaction transno is not being used anymore anywhere it is worth to make it local, i.e. transno assignment will be skipped for it, simplifying processing a bit

In all handled situations the flag tti_mult_trans to be used to mark request as multi-transaction legally and avoid error message in logs.

Comment by Mikhail Pershin [ 12/May/22 ]

the most frequent case at the moment - in the mdt_mfd_close():

  1. possible HSM updates:
    • mdt_hsm_release(). In theory it is reentrant, but in practice will fail always during replay with -ESTALE 
    • mdt_close_handle_layouts(). Non-reentrant, I don't think we can repeat layout swap, split or merge easily if it was done once already. Nowadays it doesn't support replay either failing with -ESTALE
    • mdt_close_resync_done(). This one uses th_sync flag to make transaction synced, so avoids replays at all. It looks reentrant in case if rest of close RPC will be replayed
  2. mdt_lsom_update(). This one is reentrant, can be updated many times
  3. mo_attr_set() for LA_(A|M|C)TIME update on close. It is also can be repeated
  4. mdt_add_dirty_flag() for HSM needs. It is reentrant.

So this is the most frequent and the most complicated case at the moment. Moreover all reentrant transactions are made at the end and most critical - at the beginning. Ideally that must be backward but it is not possible to make 2-3-4 steps first and 1 - last. Also I am not sure about all possible combinations they can produce.

Using tti_mult_trans will help here only for non-HSM close, without step 1 and that is good place to start with. As for step 1, considering it is not able to replay correctly and some of those operations use th_sync already, I would use th_sync for all of them to prevent replay completely and extent that th_sync to all transaction in request (can be applied to the last one I think). Though even in that case there can be situations when first transaction will be committed but rest will not, then request will be resent after recovery again and step 1 will be re-applied with changes already made. The most troublesome code here is mdt_close_handle_layouts(), it may need special handling for RESENT case

Comment by Mikhail Pershin [ 28/May/22 ]

One more thought on MDT multi-transno operations. All such RPC handlers which allows multiple transno assignment must skip reconstruction, i.e. XID is written in last_rcvd at the first transaction so in case of partial execution and resent the handler will find the same XID in lastrcvd and will assume that request is done already so it is enough to reconstruct it. But for partially executed request we need to re-execute it, so reconstruct must be disabled for a handler with mult_trans flag set.

Comment by Gerrit Updater [ 31/May/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47491
Subject: LU-15847 tgt: move tti_ transaction params to tsi_
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a78b06bd140bb6545650d52cd4e4f068b3b98152

Comment by Gerrit Updater [ 31/May/22 ]

"Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47492
Subject: LU-15847 tgt: reply always with the latest assigned transno
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 308a91cb0c11a86970d60356af25b829a0119d14

Comment by Gerrit Updater [ 25/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47492/
Subject: LU-15847 tgt: reply always with the latest assigned transno
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Comment by Gerrit Updater [ 25/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47491/
Subject: LU-15847 tgt: move tti_ transaction params to tsi_
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0a317b171ebedcba8fc58e548991a884186c350c

Comment by Peter Jones [ 25/Oct/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 26/Oct/22 ]

"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48958
Subject: LU-15847 target: report multiple transno to debug log
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 95d804f3f742b879588e3e442f5b2f87b18c327e

Comment by Gerrit Updater [ 08/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48958/
Subject: LU-15847 target: report multiple transno to debug log
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 122644ae1916601566946e71152e17fdb540a15e

Generated at Sat Feb 10 03:21:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.