Details
-
Story
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
During the race between "lfs migrate" and unlink it is possible to get files without appropriate OST objects. Below is a scenario:
1. "lfs migrate" transfers files from MDT0 to MDT1 for directory "dir"
2. client1 removes file "f1" from "dir". It removes object on MDT1 and appropriate objects on OSTs.
3. client1 disonnected from MDT1
4. MDT1 failover(probably kernel panic)
5. MDT1 recovery started
6. MDT0 resends "replay" request to create a new object for "f1" on MDT1(part of lfs migrate)
As the client was evicted right before MDT1 failover it doesn't participate in recovery and doesn't replay unlink for a new object on MDT1. Thus we have an object on MDT1 but without appropriate objects on OSTs.
Such files are usually displayed with "???" instead of attributes:
vm1:~/lustre2$ ls -l | head -3 ls: cannot access 'all_jobs_id': No such file or directory total 101100 -????????? ? ? ? ? ? all_jobs_id
Below is an example how to distinguish current issue from other cases when file could loose it's OST objects. As "lfs migrate" copies file attributes crtime will be always newer than ctime, atime and mtime:
[root@vm1 logs]# cat stat debugfs -c -R "stat REMOTE_PARENT_DIR/0x2400013a1:0x1:0x0/f3" /tmp/lustre-mdt2 > statInode: 162 Type: regular Mode: 0644 Flags: 0x0 Generation: 2069782550 Version: 0x00000000:00000000 User: 0 Group: 0 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x64807111:00000000 -- Wed Jun 7 15:59:13 2023 atime: 0x64807111:00000000 -- Wed Jun 7 15:59:13 2023 mtime: 0x64807111:00000000 -- Wed Jun 7 15:59:13 2023 crtime: 0x64807120:b6e87414 -- Wed Jun 7 15:59:28 2023 Size of extra inode fields: 32 Extended attributes: lma: fid=[0x2400013a0:0x3:0x0] compat=0 incompat=0 trusted.lov (56) = d0 0b d1 0b 01 00 00 00 52 00 00 00 00 00 00 00 02 04 00 00 02 00 00 00 00 00 10 00 01 00 00 00 02 04 00 c0 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 trusted.som (24) = 04 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 linkea: idx=0 parent=[0x2400013a1:0x1:0x0] name='f3' BLOCKS: [root@vm1 logs]# lfs getstripe /mnt/lustre/dir/f3 /mnt/lustre/dir/f3 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 4 0x4 0x2c0000402 [root@vm1 logs]# debugfs -c -R "stat O/2c0000402/d4/4" /tmp/lustre-ost2 debugfs 1.46.2.wc5 (26-Mar-2022) /tmp/lustre-ost2: catastrophic mode - not reading inode or group bitmaps O/2c0000402/d4/4: File not found by ext2_lookup