[LU-7450] call dcb commit callback in osd_trans_stop() Created: 19/Nov/15  Updated: 14/Dec/15  Resolved: 14/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6994 MDT recovery timer goes negative, rec... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In osd_trans_stop()

     } else {
                osd_trans_stop_cb(oh, th->th_result);
                OBD_FREE_PTR(oh);
     }

It should call dcb commit callback as well, because the dcb callback is registered before transaction start(). Otherwise once error happens before transaction start, local commit callback will never be called, which will stop the llog cancellation, and cumulate a lot of update logs, cause long time recovery. This issue might be related with endless recovery found in soak-test.



 Comments   
Comment by Alex Zhuravlev [ 19/Nov/15 ]

this is a change to the current api.. why do you need that?

Comment by Di Wang [ 19/Nov/15 ]

because dcb commit callback is registered before trans_start(), if it fails before trans_start(), then in osd_trans_stop will never call commit callback, i.e. we can not remove top thandle from the commit list. then the update log will be built up. cause long time recover.

Btw: what do you mean change API?

Comment by Alex Zhuravlev [ 19/Nov/15 ]

hm, but if a transaction wasn't started, then no llog regords were added? and thandle should be released in top_trans_stop() ? this is what the current implementation does.

an api isn't just a set of methods, but also some rules.

Comment by Gerrit Updater [ 19/Nov/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17268
Subject: LU-7450 osd: add commit_callback in osd_trans_stop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 47a3ee68426dbe2dcb472b9758a1fa93bc1ade66

Comment by Di Wang [ 19/Nov/15 ]

No, the top_handle is added to commit list before transaction start. we need to remove this top_handle from commit list. which is the reason we need commit callback.

Comment by Di Wang [ 19/Nov/15 ]

Hmm, we may be able to do this in top_trans_stop, if you insist not changing osd_trans_stop().

Comment by Alex Zhuravlev [ 19/Nov/15 ]

yes, exactly. logically - the transaction wasn't started, no any changes were made to the filesystem. all we need is just to release resources? this is what OSDs do now and I tend to think it's correct. this way TOP will be following the API and it's rules.

Comment by Gerrit Updater [ 13/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17268/
Subject: LU-7450 osd: call commit_callback if no write updates
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c965fc8af8641cc810f6ff5c01cefceeaa79cb74

Comment by Peter Jones [ 14/Dec/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:09:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.