Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
During recovery initial ID stored in opd_last_used_oid_file can't be changed to less value. Otherwise in case of panic(during recovery) it may cause removing already used objects on OSTs. In a case I've been investigating there was a series of panics resulted in a files without objects on OSTs.
- vm3: start MDT0 recovery
- vm3: wrote last used fid 36996941 (during recovery)
- vm3: end of recovery. 1 evicted
- vm3: deleting orphan objects from 0x0:36996942
- failover MDT0 to vm1
- vm1: mount + start recovery MDT0
- vm1: wrote last used fid 36990688, 36991282, ..., 36978433, 36991006 (during recovery)
- vm1: kernel panic on vm1 -> failover to vm3
- vm3: start recovery MDT0
- vm3: end of recovery. 1 evicted
- vm3: deleting orphan objects from 0x0:36991007
The weird thing I couldn't explain here is why in item (7) it gets requests that are less than earlier written last_used 0x0:36996942. Normally this shouldn't happen as opd_last_used_oid_file is updated with a new higher value in the end of osp_create before sending reply to client. Anyway changing opd_last_used_oid_file with lower value during recovery looks wrong.