LFSCK phase II technical debts (LU-4701)

[LU-3469] OSP dt_sync() operation should flush pending destroys and other updates Created: 13/Jun/13  Updated: 22/Jul/14  Resolved: 04/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Technical task Priority: Blocker
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-3336 LFSCK II: MDT-OST OST orphan handling Resolved
Related
is related to LU-5387 Interop 2.5.2<->2.6 failure on test s... Resolved
Rank (Obsolete): 8679

 Description   

lod_sync() should be changed to allow flushing the pending OST object destroys so that LFSCK can ensure that the OST state is consistent after pass 1 is completed:

  • call local/child OSD dt_sync() first, so that unlink name commit callbacks will be run and schedule all of the OST object destroys
  • OSP dt_sync() will wait until all of the pending object destroys are at least sent to the OSTs (accessing their respective OST objects and removing them from the orphan list). They do not necessarily need to be committed.


 Comments   
Comment by Andreas Dilger [ 10/Sep/13 ]

Alex, any comments on this issue?

Comment by Andreas Dilger [ 11/Sep/13 ]

One proposal is to have LFSCK only do a sync on a specific FID (e.g. the orphan object FID) will cause the pending destroys to be flushed. There is a larger question of whether sync on the LOD in general should cause all the destroys to be flushed, so that users can force file deletions to be flushed to the OSTs and release space.

This issue is a fairly rare case of files being deleted that were traversed in the last 15 or so seconds of the phase II namespace scan. At worst this would result in previously-deleted files being linked into lost+found.

Comment by Richard Henwood (Inactive) [ 05/Mar/14 ]

After discussion with Fan Yong: I would like this bug to be considered for Blocker status.

Comment by nasf (Inactive) [ 19/Mar/14 ]

Recently, I hit some test failures for test_18d/test_18e. The root reason is that the async repair operations triggered during first cycle scanning are not executed until orphan handling phase.

So we need the dt_sync() for that.

Comment by Alex Zhuravlev [ 18/Apr/14 ]

Yong, could you point out which specific lines will be calling dt_sync(), please?

Comment by nasf (Inactive) [ 18/Apr/14 ]

Alex, the current LFSCK does not call it since osp_sync() is not implemented. But it is easy to add it as the following patch:

diff --git a/lustre/lfsck/lfsck_layout.c b/lustre/lfsck/lfsck_layout.c
index de96726..7be2346 100644
--- a/lustre/lfsck/lfsck_layout.c
+++ b/lustre/lfsck/lfsck_layout.c
@@ -3435,6 +3435,9 @@ static int lfsck_layout_assistant(void *args)
                                com->lc_time_last_checkpoint +
                                cfs_time_seconds(LFSCK_CHECKPOINT_INTERVAL);
 
+                       /* flush all async updating before handling orphan. */
+                       dt_sync(env, lfsck->li_next);
+
                        while (llmd->llmd_in_double_scan) {
                                struct lfsck_tgt_descs  *ltds =
                                                        &lfsck->li_ost_descs;
Comment by Andreas Dilger [ 21/Apr/14 ]

Alex, Fan Yong, is there any reason not to just convert the previous comment into a patch? Is there more work to be done?

Comment by Alex Zhuravlev [ 21/Apr/14 ]

like said before, osp_sync() is empty yet, I've been working on this.

Comment by Alex Zhuravlev [ 22/Apr/14 ]

also, at the moment I'm not sure just a single dt_sync() is good enough. failed osp_sync() isn't any different from failed transaction? if we can't complete one phase on a specific OST we shouldn't continue with that OST in this LFSCK run? IOW, we'd prefer to track per-OST status rather than combined one which doesn't let us learn which OSTs are safe to proceed?

Comment by nasf (Inactive) [ 22/Apr/14 ]

If osp_sync() is failed to flush all pending destroy to OSTs, and if the LFSCK goes ahead for orphan handling, then the LFSCK may found some OST-object)(s) which parent MDT-object have been destroyed on the MDT(s) by unlink, but the LFSCK cannot be aware of the race unlink, so it will re-create the MDT-object(s), that is unexpected.

One possible solution is that, if some osp_sync() failed, then skip orphan handling on related OST(s).

Comment by Alex Zhuravlev [ 22/Apr/14 ]

http://review.whamcloud.com/10046 - this patch implements osp_sync() which is synchronous. give the discussion above I tend to think it'd be better to call into all OSPs/OSDs directly from LFSCK. in the long term, something like an empty (or almost empty async transactions with commit callbacks registered would serve better being concurrent.

Comment by Andreas Dilger [ 22/May/14 ]

Alex, Fan Yong, what is needed next for this patch? Are further changes needed in the LFSCK code to start using this new functionality?

Comment by nasf (Inactive) [ 23/May/14 ]

Generally, the patch looks good, but we still cannot land the patch because of Maloo test failure.

Comment by Jodi Levi (Inactive) [ 03/Jun/14 ]

Assigned to Fan Yong as he has updated the patch

Comment by Peter Jones [ 04/Jun/14 ]

Landed for 2.6

Generated at Sat Feb 10 01:34:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.