[LU-7450] call dcb commit callback in osd_trans_stop() Created: 19/Nov/15 Updated: 14/Dec/15 Resolved: 14/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Di Wang | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
In osd_trans_stop() } else {
osd_trans_stop_cb(oh, th->th_result);
OBD_FREE_PTR(oh);
}
It should call dcb commit callback as well, because the dcb callback is registered before transaction start(). Otherwise once error happens before transaction start, local commit callback will never be called, which will stop the llog cancellation, and cumulate a lot of update logs, cause long time recovery. This issue might be related with endless recovery found in soak-test. |
| Comments |
| Comment by Alex Zhuravlev [ 19/Nov/15 ] |
|
this is a change to the current api.. why do you need that? |
| Comment by Di Wang [ 19/Nov/15 ] |
|
because dcb commit callback is registered before trans_start(), if it fails before trans_start(), then in osd_trans_stop will never call commit callback, i.e. we can not remove top thandle from the commit list. then the update log will be built up. cause long time recover. Btw: what do you mean change API? |
| Comment by Alex Zhuravlev [ 19/Nov/15 ] |
|
hm, but if a transaction wasn't started, then no llog regords were added? and thandle should be released in top_trans_stop() ? this is what the current implementation does. an api isn't just a set of methods, but also some rules. |
| Comment by Gerrit Updater [ 19/Nov/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/17268 |
| Comment by Di Wang [ 19/Nov/15 ] |
|
No, the top_handle is added to commit list before transaction start. we need to remove this top_handle from commit list. which is the reason we need commit callback. |
| Comment by Di Wang [ 19/Nov/15 ] |
|
Hmm, we may be able to do this in top_trans_stop, if you insist not changing osd_trans_stop(). |
| Comment by Alex Zhuravlev [ 19/Nov/15 ] |
|
yes, exactly. logically - the transaction wasn't started, no any changes were made to the filesystem. all we need is just to release resources? this is what OSDs do now and I tend to think it's correct. this way TOP will be following the API and it's rules. |
| Comment by Gerrit Updater [ 13/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17268/ |
| Comment by Peter Jones [ 14/Dec/15 ] |
|
Landed for 2.8 |