[LU-15180] OST object missing parent fid Created: 29/Oct/21 Updated: 02/Nov/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We have files who's object are missing parent fid. For example this file: # lfs getstripe img.0291328.ppm# lfs getstripe img.0291328.ppmimg.0291328.ppmlmm_stripe_count: 4lmm_stripe_size: 1048576lmm_pattern: raid0lmm_layout_gen: 0lmm_stripe_offset: 68 obdidx objid objid group 68 3671546 0x3805fa 0 80 5285906 0x50a812 0 97 5277923 0x5088e3 0 122 6631005 0x652e5d 0 When we look at the first object on ost68 we see the parrent fid is missing. # debugfs -c -R "stat /O/0/d26/3671546" /dev/mapper/nbp1_6-OST68 debugfs 1.45.2.wc1 (27-May-2019) /dev/mapper/nbp1_6-OST68: catastrophic mode - not reading inode or group bitmaps Inode: 378232 Type: regular Mode: 01666 Flags: 0x80000 Generation: 1064600594 Version: 0x0000001b:00041e9e User: 4806 Group: 1128 Project: 0 Size: 2097152 File ACL: 0 Links: 1 Blockcount: 4096 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d4f4613:00000000 -- Sat Aug 10 15:32:51 2019 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 mtime: 0x5d4f4613:00000000 -- Sat Aug 10 15:32:51 2019 crtime: 0x5d4f4187:7a7cf08c -- Sat Aug 10 15:13:27 2019 Size of extra inode fields: 32 Extended attributes: trusted.lma (64) lma: fid=[0x100440000:0x3805fa:0x0] compat=8 incompat=0 fid: parent=[0:0x0:0x0] stripe=0 stripe_size=0 stripe_count=0 EXTENTS: (0-511):419245056-419245567 How can we fix the object missing parent fid. |
| Comments |
| Comment by Andreas Dilger [ 31/Oct/21 ] |
|
Objects that have never been modified do not have their parent FID updated, as they depend on receiving this information from the client on the first write. n unused OST object would normally have a size of zero, the timestamps at 1970, and the mode has the SUID and SGID bits set (06000). However, in this case, none of these conditions are true, so it isn't clear why the parent FID is missing. There are non-zero UID and GID set, and the file has data, so the only thing I can think of is that the client somehow sent an RPC that did not contain the parent FID in it when the object was first modified. Based on the object creation time, it looks like this happened with an older version of Lustre, so it is possible that the root cause is fixed (some RPC sent without the parent FID), but since the SUID/SGID bits are cleared the FID will not be updated. As for fixing this issue, I think a few things can be done:
|
| Comment by Peter Jones [ 01/Nov/21 ] |
|
Hongchao Do you have some suggestions how we could deal with this kind of scenario automatically in the future? Peter |
| Comment by Hongchao Zhang [ 02/Nov/21 ] |
|
Hi, debugfs 1.45.6.xa7 (20-Apr-2021) /tmp/lustre-ost1: catastrophic mode - not reading inode or group bitmaps Inode: 168 Type: regular Mode: 0666 Flags: 0x80000 Generation: 1033363972 Version: 0x00000001:00000003 User: 0 Group: 0 Project: 0 Size: 10485760 File ACL: 0 Links: 1 Blockcount: 20480 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6180f40c:00000000 -- Tue Nov 2 16:17:16 2021 atime: 0x00000000:00000000 -- Thu Jan 1 08:00:00 1970 mtime: 0x6180f40c:00000000 -- Tue Nov 2 16:17:16 2021 crtime: 0x6180ef1e:0838fe80 -- Tue Nov 2 15:56:14 2021 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 lma: fid=[0x100000000:0x2:0x0] compat=8 incompat=0 trusted.fid (52) fid: parent=[0x200000404:0x1:0x0] stripe=1 stripe_size=1048576 stripe_count=2 EXTENTS: (0-2559):67584-70143 It seems size of "trusted.lma" is different. Could you please try to use the latest e2fsprogs (https://downloads.whamcloud.com/public/e2fsprogs/) to try again? |