Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18620

LFSCK does not fix broken agent entries

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Here my attempt to re-create a missing agent entry for a remote stripe of a striped dir :

      [root@rocky tests]# ../utils/lfs mkdir -H fnv_1a_64 -c 2 /mnt/lustre/dir-c2
      [root@rocky tests]# ../utils/lfs path2fid /mnt/lustre/dir-c2
      [0x200000402:0x1:0x0]
      [root@rocky tests]# ../utils/lfs getdirstripe /mnt/lustre/dir-c2
      lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
      mdtidx		 FID[seq:oid:ver]
           0		 [0x200000400:0x2:0x0]		
           1		 [0x240000401:0x2:0x0]		
      [root@rocky tests]# debugfs /dev/mapper/mds2_flakey  -R "ls -lD REMOTE_PARENT_DIR"
      debugfs 1.46.2.wc5 (26-Mar-2022)
        25001   40755 (2)      0      0    4096  9-Jan-2025 16:04 .
            2   40755 (2)      0      0    4096  9-Jan-2025 15:31 ..
        25047   40755 (2)      0      0    4096  9-Jan-2025 16:04 0x240000401:0x2:0x0
      
      [root@rocky tests]# debugfs /dev/mapper/mds1_flakey  -R "ls ROOT/dir-c2"
      debugfs 1.46.2.wc5 (26-Mar-2022)
       25049  (12) .    25043  (28) ..    25050  (52) [0x200000400:0x2:0x0]:0   
       25051  (4004) [0x240000401:0x2:0x0]:1   
      [root@rocky tests]# debugfs /dev/mapper/mds1_flakey  -R "ls -lD ROOT/dir-c2"
      debugfs 1.46.2.wc5 (26-Mar-2022)
        25049   40755 (2)      0      0    4096  9-Jan-2025 16:04 .
        25043   40755 (18)      0      0    4096  9-Jan-2025 16:04 fid:[0x200000007:0x1:0x0] ..
        25050   40755 (18)      0      0    4096  9-Jan-2025 16:04 fid:[0x200000400:0x2:0x0] [0x200000400:0x2:0x0]:0
        25051   40000 (18)      0      0    4096  1-Jan-1970 03:00 fid:[0x240000401:0x2:0x0] [0x240000401:0x2:0x0]:1
      

      removing the agent entry:

      [root@rocky tests]# umount /mnt/lustre-mds2
      [root@rocky tests]# debugfs -w /dev/mapper/mds2_flakey  -R "unlink REMOTE_PARENT_DIR/0x240000401:0x2:0x0"
      debugfs 1.46.2.wc5 (26-Mar-2022)
      [root@rocky tests]# debugfs /dev/mapper/mds2_flakey  -R "ls -lD REMOTE_PARENT_DIR"
      debugfs 1.46.2.wc5 (26-Mar-2022)
        25001   40755 (2)      0      0    4096  9-Jan-2025 16:04 .
            2   40755 (2)      0      0    4096  9-Jan-2025 15:31 ..
      

      starting LFSCK namespace

      [root@rocky tests]# mount -t lustre /dev/mapper/mds2_flakey /mnt/lustre-mds2
      [root@rocky tests]# ../utils/lctl lfsck_start -M lustre-MDT0000  -t namespace
      Started LFSCK on the device lustre-MDT0000: scrub namespace
      [root@rocky tests]# ../utils/lctl lfsck_start -M lustre-MDT0001  -t namespace
      Started LFSCK on the device lustre-MDT0001: scrub namespace
      
      

      checking the results of the LFSCK runs, seeing that
      an object with some new FID created and new agent entry inserted into /REMOTE_PARENT_DIR :

      [root@rocky tests]# debugfs /dev/mapper/mds2_flakey  -R "ls -lD REMOTE_PARENT_DIR"
      debugfs 1.46.2.wc5 (26-Mar-2022)
        25001   40755 (2)      0      0    4096  9-Jan-2025 16:07 .
            2   40755 (2)      0      0    4096  9-Jan-2025 15:31 ..
        25048   40700 (2)      0      0    4096  9-Jan-2025 16:07 0x240000bd0:0x1:0x0
      

      it is a different FID from the FID of the remote stripe:

      [root@rocky tests]# debugfs /dev/mapper/mds1_flakey  -R "ls -lD ROOT/dir-c2"
      debugfs 1.46.2.wc5 (26-Mar-2022)
        25049   40755 (2)      0      0    4096  9-Jan-2025 16:04 .
        25043   40755 (18)      0      0    4096  9-Jan-2025 16:04 fid:[0x200000007:0x1:0x0] ..
        25050   40755 (18)      0      0    4096  9-Jan-2025 16:04 fid:[0x200000400:0x2:0x0] [0x200000400:0x2:0x0]:0
        25051   40000 (18)      0      0    4096  1-Jan-1970 03:00 fid:[0x240000401:0x2:0x0] [0x240000401:0x2:0x0]:1
      

      the dir inode with the remote stripe still in use and still an orphan, not connected to any dir:

      [root@rocky tests]# debugfs /dev/mapper/mds2_flakey  
      debugfs 1.46.2.wc5 (26-Mar-2022)
      debugfs:  testi <25047>
      Inode 25047 is marked in use
      debugfs:  ncheck 25047
      Inode	Pathname
      debugfs:   
      

      Attachments

        Activity

          [LU-18620] LFSCK does not fix broken agent entries

          People

            zam Alexander Zarochentsev
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: