Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Here my attempt to re-create a missing agent entry for a remote stripe of a striped dir :
[root@rocky tests]# ../utils/lfs mkdir -H fnv_1a_64 -c 2 /mnt/lustre/dir-c2 [root@rocky tests]# ../utils/lfs path2fid /mnt/lustre/dir-c2 [0x200000402:0x1:0x0] [root@rocky tests]# ../utils/lfs getdirstripe /mnt/lustre/dir-c2 lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64 mdtidx FID[seq:oid:ver] 0 [0x200000400:0x2:0x0] 1 [0x240000401:0x2:0x0] [root@rocky tests]# debugfs /dev/mapper/mds2_flakey -R "ls -lD REMOTE_PARENT_DIR" debugfs 1.46.2.wc5 (26-Mar-2022) 25001 40755 (2) 0 0 4096 9-Jan-2025 16:04 . 2 40755 (2) 0 0 4096 9-Jan-2025 15:31 .. 25047 40755 (2) 0 0 4096 9-Jan-2025 16:04 0x240000401:0x2:0x0 [root@rocky tests]# debugfs /dev/mapper/mds1_flakey -R "ls ROOT/dir-c2" debugfs 1.46.2.wc5 (26-Mar-2022) 25049 (12) . 25043 (28) .. 25050 (52) [0x200000400:0x2:0x0]:0 25051 (4004) [0x240000401:0x2:0x0]:1 [root@rocky tests]# debugfs /dev/mapper/mds1_flakey -R "ls -lD ROOT/dir-c2" debugfs 1.46.2.wc5 (26-Mar-2022) 25049 40755 (2) 0 0 4096 9-Jan-2025 16:04 . 25043 40755 (18) 0 0 4096 9-Jan-2025 16:04 fid:[0x200000007:0x1:0x0] .. 25050 40755 (18) 0 0 4096 9-Jan-2025 16:04 fid:[0x200000400:0x2:0x0] [0x200000400:0x2:0x0]:0 25051 40000 (18) 0 0 4096 1-Jan-1970 03:00 fid:[0x240000401:0x2:0x0] [0x240000401:0x2:0x0]:1
removing the agent entry:
[root@rocky tests]# umount /mnt/lustre-mds2 [root@rocky tests]# debugfs -w /dev/mapper/mds2_flakey -R "unlink REMOTE_PARENT_DIR/0x240000401:0x2:0x0" debugfs 1.46.2.wc5 (26-Mar-2022) [root@rocky tests]# debugfs /dev/mapper/mds2_flakey -R "ls -lD REMOTE_PARENT_DIR" debugfs 1.46.2.wc5 (26-Mar-2022) 25001 40755 (2) 0 0 4096 9-Jan-2025 16:04 . 2 40755 (2) 0 0 4096 9-Jan-2025 15:31 ..
starting LFSCK namespace
[root@rocky tests]# mount -t lustre /dev/mapper/mds2_flakey /mnt/lustre-mds2 [root@rocky tests]# ../utils/lctl lfsck_start -M lustre-MDT0000 -t namespace Started LFSCK on the device lustre-MDT0000: scrub namespace [root@rocky tests]# ../utils/lctl lfsck_start -M lustre-MDT0001 -t namespace Started LFSCK on the device lustre-MDT0001: scrub namespace
checking the results of the LFSCK runs, seeing that
an object with some new FID created and new agent entry inserted into /REMOTE_PARENT_DIR :
[root@rocky tests]# debugfs /dev/mapper/mds2_flakey -R "ls -lD REMOTE_PARENT_DIR" debugfs 1.46.2.wc5 (26-Mar-2022) 25001 40755 (2) 0 0 4096 9-Jan-2025 16:07 . 2 40755 (2) 0 0 4096 9-Jan-2025 15:31 .. 25048 40700 (2) 0 0 4096 9-Jan-2025 16:07 0x240000bd0:0x1:0x0
it is a different FID from the FID of the remote stripe:
[root@rocky tests]# debugfs /dev/mapper/mds1_flakey -R "ls -lD ROOT/dir-c2" debugfs 1.46.2.wc5 (26-Mar-2022) 25049 40755 (2) 0 0 4096 9-Jan-2025 16:04 . 25043 40755 (18) 0 0 4096 9-Jan-2025 16:04 fid:[0x200000007:0x1:0x0] .. 25050 40755 (18) 0 0 4096 9-Jan-2025 16:04 fid:[0x200000400:0x2:0x0] [0x200000400:0x2:0x0]:0 25051 40000 (18) 0 0 4096 1-Jan-1970 03:00 fid:[0x240000401:0x2:0x0] [0x240000401:0x2:0x0]:1
the dir inode with the remote stripe still in use and still an orphan, not connected to any dir:
[root@rocky tests]# debugfs /dev/mapper/mds2_flakey debugfs 1.46.2.wc5 (26-Mar-2022) debugfs: testi <25047> Inode 25047 is marked in use debugfs: ncheck 25047 Inode Pathname debugfs:
sometimes a modified version of sanity-lfsck.sh:test_35 (fault injection is replaced by debugfs unlink cmd) fails this way: