Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17369

Missing OST objects after "lfs migrate"

    XMLWordPrintable

Details

    • Story
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      During the race between "lfs migrate" and unlink it is possible to get files without appropriate OST objects. Below is a scenario:

      1. "lfs migrate" transfers files from MDT0 to MDT1 for directory "dir"
      2. client1 removes file "f1" from "dir". It removes object on MDT1 and appropriate objects on OSTs.
      3. client1 disonnected from MDT1
      4. MDT1 failover(probably kernel panic)
      5. MDT1 recovery started
      6. MDT0 resends "replay" request to create a new object for "f1" on MDT1(part of lfs migrate)

      As the client was evicted right before MDT1 failover it doesn't participate in recovery and doesn't replay unlink for a new object on MDT1. Thus we have an object on MDT1 but without appropriate objects on OSTs.

      Such files are usually displayed with "???" instead of attributes:

      vm1:~/lustre2$ ls -l | head -3
      ls: cannot access 'all_jobs_id': No such file or directory
      total 101100
      -????????? ? ? ? ? ? all_jobs_id 

      Below is an example how to distinguish current issue from other cases when file could loose it's OST objects. As "lfs migrate" copies file attributes crtime will be always newer than ctime, atime and mtime:

      [root@vm1 logs]# cat stat
      debugfs -c -R "stat REMOTE_PARENT_DIR/0x2400013a1:0x1:0x0/f3" /tmp/lustre-mdt2 > statInode: 162   Type: regular    Mode:  0644   Flags: 0x0
      Generation: 2069782550    Version: 0x00000000:00000000
      User:     0   Group:     0   Project:     0   Size: 0
      File ACL: 0
      Links: 1   Blockcount: 0
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x64807111:00000000 -- Wed Jun  7 15:59:13 2023
       atime: 0x64807111:00000000 -- Wed Jun  7 15:59:13 2023
       mtime: 0x64807111:00000000 -- Wed Jun  7 15:59:13 2023
      crtime: 0x64807120:b6e87414 -- Wed Jun  7 15:59:28 2023
      Size of extra inode fields: 32
      Extended attributes:
        lma: fid=[0x2400013a0:0x3:0x0] compat=0 incompat=0
        trusted.lov (56) = d0 0b d1 0b 01 00 00 00 52 00 00 00 00 00 00 00 02 04 00 00 02 00 00 00 00 00 10 00 01 00 00 00 02 04 00 c0 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 
        trusted.som (24) = 04 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 
        linkea: idx=0 parent=[0x2400013a1:0x1:0x0] name='f3'
      BLOCKS:
      
      [root@vm1 logs]# lfs getstripe /mnt/lustre/dir/f3
      /mnt/lustre/dir/f3
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      	obdidx		 objid		 objid		 group
      	     1	             4	          0x4	   0x2c0000402
      
      [root@vm1 logs]# debugfs -c -R "stat O/2c0000402/d4/4" /tmp/lustre-ost2
      debugfs 1.46.2.wc5 (26-Mar-2022)
      /tmp/lustre-ost2: catastrophic mode - not reading inode or group bitmaps
      O/2c0000402/d4/4: File not found by ext2_lookup  

      Attachments

        Activity

          People

            wc-triage WC Triage
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: