[LU-3654] LFSCK 5: Handle OI mapping conflict when recover OST-objects from /lost+found Created: 29/Jul/13  Updated: 13/Feb/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Improvement Priority: Trivial
Reporter: nasf (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9399

 Description   

When OI scrub try to recover the OST-objects under /lost+found, it may meet the case that the OI mapping slot in /O/<seq>/d<x> has been reused by others. The new OST-object may have been modified by others already, and may be not. For the later case (zero-sized and with SUID+SGID), we want to remove it and insert the original one back to the /O/<seq>/d<x>. But there are something to be considered:

1) The OST-object under /lost+found has crashed LMA. So it should not conflict with the current one.

2) There are race conditions that: someone may just want to modify the current one. Even if the OI scrub takes the object lock when remove the current one, it still cause the modification to be lost becasue the target
has been removed when the RPC service thread waiting for the lock.



 Comments   
Comment by Andreas Dilger [ 20/Sep/13 ]

Fan Yong, is this a defect in the OST OI Scrub code that was landed on master, or is this part of the LFSCK Phase 2 feature implementation that is not yet landed on master and will land for 2.6.0?

Comment by nasf (Inactive) [ 20/Sep/13 ]

It is an known race case in master which marked as "XXX:" in the file osd_compat.c. It is brought out during LFSCK phase 2 implementation phase, but we did not have a clear plan when to resolve it.

Comment by Andreas Dilger [ 24/Sep/13 ]

Any plan to submit a patch for this? The patch cutoff is at the end of this week for anything except critical bugs.

Comment by nasf (Inactive) [ 25/Sep/13 ]

This bug is happened under very rare case, the worst case for that is we keep the conflict files there without modification. That is equal to the case that we did not recover those conflict files, which is the same as before Lustre-2.5 solution introduced. In another word, although it is brought out during LFSCK phase2, it is not Lustre-2.5 special, such case has existed all along since Lustre-1.x with ll_recover_lost_found_objs to recover objects manually.

So it is not a critical bug. It is not urgent to resolve it. We can do that in Lustre-2.5.x or Lustre-2.6.

Comment by nasf (Inactive) [ 27/Jan/15 ]

The solution will be:
1) If the object under /lost+found is empty, then keep the conflict one and removed the object under /lost+found; otherwise,
2) If the conflict object is empty, then remove the conflict one, it is not important whether the client side cached dirty data or not, because when OI scrub handling the object under /lost+found, the client still not start to flush the dirty data back, after the conflict object is replaced by the object under /lost+found, the client side dirty data will be written to new object that was under /lost+found.
3) if both the two objects are not empty, then report conflict without repairing.

Generated at Sat Feb 10 01:35:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.