[LU-4876] LFSCK remove entry from /REMOTE_PARENT_DIR if MDT-object name reside on the same MDT after migration Created: 09/Apr/14 Updated: 31/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | nasf (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | dne2, lfsck | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 13487 | ||||||||||||||||||||||||||||
| Description |
|
When migrate directory MDT-object_A's metadata from MDT_x to MDT_y, all its children name entries needs to be moved to MDT_y. If some name entry originally referenced a remote MDT-object_B which was linked under "/REMOTE_PARENT_DIR" with a dummy name entry on MDT_y, but after the migration, the MDT-object_B and its new name entry resides on the same MDT_y, then the dummy entry for the MDT-object_B should be removed from "REMOTE_PARENT_DIR/" iff there are no links remaining on that MDT. |
| Comments |
| Comment by Di Wang [ 15/Apr/14 ] |
|
Hmm, we can remove the object from "REMOTE_PARENT_DIR" when we update the linkEA, but this will add extra complication into osd_xattr_set. But even we leave dummy inode in REMOTE_PARENT_DIR, it did not effect the consistency of the FS, though it will leave some orphans there and I also wonder if LFSCK will do sth special for REMOTE_PARENT_DIR for DNE check. |
| Comment by nasf (Inactive) [ 15/Apr/14 ] |
|
From the LFSCK view, if two name entries reference the same MDT-object, but its nlink count is 1, that is inconsistent status. |
| Comment by Andreas Dilger [ 04/Dec/14 ] |
|
It seems to me that there are two separate issues here:
|
| Comment by nasf (Inactive) [ 04/Dec/14 ] |
|
I have created another ticket for LFSCK: |
| Comment by Andreas Dilger [ 11/Apr/17 ] |
|
Lai, is this something you can look at while fixing the striped directory migration in |
| Comment by Artem Blagodarenko (Inactive) [ 20/Sep/18 ] |
|
Hello, I have migrate related question. I see probable LFSCK-related inconsistency possible. I catched ldiskfs inconsistency after directory migration. Started this command SLOW=yes RUNAS_ID=1000 CLEANUP=cleanupall SETUP=setupall OSTCOUNT=4 MDSCOUNT=4 OSTSIZE=600000 MDTSIZE=300000 ONLY=230h sh /usr/lib/lustre/tests/sanity.sh and run fsck just after test has been finished. Fsck shows: Inode 25043 ref count is 5, should be 4. Fix? no Pass 5: Checking group summary information Free blocks count wrong (32862, counted=32853). Fix? no Free inodes count wrong (99721, counted=99718). Fix? no lustre-MDT0000: ********** WARNING: Filesystem still has errors ********** Inode with broken count is "/ROOT"
# debugfs lustre-mdt1
debugfs 1.42.13.wc6 (05-Feb-2017)
debugfs: ncheck 25043
Inode Pathname
25043 //ROOT
Inode has 5 links # debugfs -R "stat <25043>" lustre-mdt1 debugfs 1.42.13.wc6 (05-Feb-2017) Inode: 25043 Type: directory Mode: 0755 Flags: 0x0 Generation: 3666376863 Version: 0x00000001:00000008 User: 0 Group: 0 Project: 0 Size: 4096 File ACL: 0 Directory ACL: 0 Links: 5 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5ba2dcf7:00000000 -- Wed Sep 19 19:34:15 2018 atime: 0x5ba2dcbc:00000000 -- Wed Sep 19 19:33:16 2018 mtime: 0x5ba2dcf7:00000000 -- Wed Sep 19 19:34:15 2018 crtime: 0x5ba2dcbc:b0d6b4b8 -- Wed Sep 19 19:33:16 2018 Size of extra inode fields: 32 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 07 00 00 00 02 00 00 00 01 00 00 00 00 00 00 00 " (24) lma: fid=[0x200000007:0x1:0x0] compat=0 incompat=0 BLOCKS: (0):12623 TOTAL: 1 But should be 4 debugfs(mdt1): ls -l /ROOT 25043 40755 (2) 0 0 4096 19-Sep-2018 19:34 . 2 40755 (2) 0 0 4096 19-Sep-2018 19:33 .. 25044 40755 (18) 0 0 4096 19-Sep-2018 19:33 .lustre 25049 40000 (18) 0 0 4096 19-Sep-2018 19:34 d230h.sanity sanity 230h test is about migrating root's subdirectory to mdt2. The main operation is
$LFS migrate -m1 $DIR/$tdir/migrate_dir/.. ||
error "migrating $tdir fail"
After this operation $DIR/$tdi and $DIR/$tdir/migrate_dir/ are moved to mdt2 debugfs(mdt2): ls -l /REMOTE_PARENT_DIR 25001 40755 (2) 0 0 4096 19-Sep-2018 19:34 . 2 40755 (2) 0 0 4096 19-Sep-2018 19:33 .. 25046 40755 (2) 0 0 4096 19-Sep-2018 19:34 0x240000404:0x1:0x0 # lfs fid2path /mnt/lustre 0x240000404:0x1:0x0 /mnt/lustre/d230h.sanity Do you have idea why anode has reference 5, then it need have reference 4? Thanks. |
| Comment by Lai Siyao [ 21/Sep/18 ] |
|
Hi Artem, can you test with latest master code? I think patches for |
| Comment by Andreas Dilger [ 18/Oct/18 ] |
|
Artem, I also noticed this, but it was fixed (at least for 2.12) by unmounting the MDT and running e2fsck on the unmounted device. While it was mounted, the change was only in the journal and did not show up in the filesystem itself. |
| Comment by Artem Blagodarenko (Inactive) [ 19/Oct/18 ] |
|
adilger Thanks for idea. I believe the target was not mounted during fsck. I used CLEANUP=cleanupall option. |
| Comment by Andreas Dilger [ 19/Oct/18 ] |
|
Artem, it just looks that way because of the e2fsck messages: Pass 5: Checking group summary information Free blocks count wrong (32862, counted=32853). Free inodes count wrong (99721, counted=99718). which usually means that the filesystem is mounted, or at least was not unmounted cleanly. |
| Comment by Andreas Dilger [ 31/Jan/22 ] |
|
I've also seen a few cases where e2fsck complains about hard-linked directories in REMOTE_PARENT_DIR: Entry '0x200007168:0x4af:0x0' in /REMOTE_PARENT_DIR (1119354881) is a link to directory /ROOT/.lustre/lost+found/MDT0000/[0x200005233:0x89:0x0]-O-0 (8938935). Clear? yes It looks like e2fsck fixes this by removing the entry from REMOTE_PARENT_DIR, but LFSCK shouldn't get into this situation in the first place. |