[LU-16978] don't update last_used_oid_file with lower id Created: 25/Jul/23  Updated: 25/Jul/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sergey Cheremencev Assignee: Sergey Cheremencev
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

During recovery initial ID stored in opd_last_used_oid_file can't be changed to less value. Otherwise in case of panic(during recovery) it may cause removing already used objects on OSTs. In a case I've been investigating there was a series of panics resulted in a files without objects on OSTs.

  1. vm3: start MDT0 recovery
  2. vm3: wrote last used fid 36996941 (during recovery)
  3. vm3: end of recovery. 1 evicted
  4. vm3: deleting orphan objects from 0x0:36996942
  5. failover MDT0 to vm1
  6. vm1: mount + start recovery MDT0
  7. vm1: wrote last used fid 36990688, 36991282, ..., 36978433, 36991006 (during recovery)
  8. vm1: kernel panic on vm1 -> failover to vm3
  9.  vm3: start recovery MDT0
  10.  vm3: end of recovery. 1 evicted
  11.  vm3: deleting orphan objects from 0x0:36991007

 The weird thing I couldn't explain here is why in item (7) it gets requests that are less than earlier written last_used 0x0:36996942. Normally this shouldn't happen as opd_last_used_oid_file is updated with a new higher value in the end of osp_create before sending reply to client. Anyway changing opd_last_used_oid_file with lower value during recovery looks wrong. 


Generated at Sat Feb 10 03:31:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.