Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16978

don't update last_used_oid_file with lower id

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      During recovery initial ID stored in opd_last_used_oid_file can't be changed to less value. Otherwise in case of panic(during recovery) it may cause removing already used objects on OSTs. In a case I've been investigating there was a series of panics resulted in a files without objects on OSTs.

      1. vm3: start MDT0 recovery
      2. vm3: wrote last used fid 36996941 (during recovery)
      3. vm3: end of recovery. 1 evicted
      4. vm3: deleting orphan objects from 0x0:36996942
      5. failover MDT0 to vm1
      6. vm1: mount + start recovery MDT0
      7. vm1: wrote last used fid 36990688, 36991282, ..., 36978433, 36991006 (during recovery)
      8. vm1: kernel panic on vm1 -> failover to vm3
      9.  vm3: start recovery MDT0
      10.  vm3: end of recovery. 1 evicted
      11.  vm3: deleting orphan objects from 0x0:36991007

       The weird thing I couldn't explain here is why in item (7) it gets requests that are less than earlier written last_used 0x0:36996942. Normally this shouldn't happen as opd_last_used_oid_file is updated with a new higher value in the end of osp_create before sending reply to client. Anyway changing opd_last_used_oid_file with lower value during recovery looks wrong. 

      Attachments

        Activity

          People

            scherementsev Sergey Cheremencev
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: