Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.11.0, Lustre 2.12.0
-
None
-
3
-
9223372036854775807
Description
It appears that FIEMAP doesn't work quite correctly for PFL or FLR files. I'd expect that it would dump the FIEMAP information for all copies of a mirrored file, but it exits when the first object is printed. I verified with GDB that the fiemap structure returned by the kernel only has a single extent in it.
$ lfs getstripe /mnt/testfs/flr2 /mnt/testfs/flr2 lcm_layout_gen: 5 lcm_mirror_count: 2 lcm_entry_count: 2 lcme_id: 65537 lcme_mirror_id: 1 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 3 lmm_objects: - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x4:0x0] } lcme_id: 131074 lcme_mirror_id: 2 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: EOF lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] } $ filefrag -v /mnt/testfs/flr2 Filesystem type is: bd00bd0 File size of /mnt/testfs/flr2 is 8388608 (8192 blocks of 1024 bytes) ext: device_logical: physical_offset: length: dev: flags: 0: 0.. 8191: 140000.. 148191: 8192: 0003: last,net,eof /mnt/testfs/flr2: 1 extent found
For FLR files this is at least somewhat usable, since a full copy of the file is handled. However, for PFL files, the kernel returns an -EFBIG error after the first component is read:
$ lfs getstripe /mnt/testfs/pfl3x8 lcm_entry_count: 3 lcme_extent.e_start: 0 lcme_extent.e_end: 1048576 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_objects: - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] } lcme_extent.e_start: 1048576 lcme_extent.e_end: 4194304 lmm_stripe_count: 3 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_objects: - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] } - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] } lcme_extent.e_start: 4194304 lcme_extent.e_end: EOF lmm_stripe_count: 4 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_objects: - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3:0x0] } - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] } - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] } - 3: { l_ost_idx: 2, l_fid: [0x100020000:0x3:0x0] } $ filefrag -v /mnt/testfs/pfl3x8 Filesystem type is: bd00bd0 File size of /mnt/testfs/pfl3x8 is 8388608 (8192 blocks of 1024 bytes) ext: device_logical: physical_offset: length: dev: flags: 0: 0.. 1023: 134880.. 135903: 1024: 0002: net 1: 0.. 1023: 134880.. 135903: 1024: 0000: net 2: 1024.. 2047: 136928.. 137951: 1024: 0003: net
It starts almost correctly, with an extent on OST0002 for [0,1MB] and then something that looks like a duplicate of the first extent with only the device number changed, and an extent on OST0003 for [1MB,2MB]. The last 6 extents are not returned in the first call despite there being plenty of room in the fiemap array.
Running strace shows:
ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=0x40000000 /* FIEMAP_FLAG_??? */, fm_extent_count=292} => {fm_flags=0x40000000 /* FIEMAP_FLAG_??? */, fm_mapped_extents=3, fm_extents=...}) = 0 ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=0x40000000 /* FIEMAP_FLAG_??? */, fm_extent_count=292}) = -1 EFBIG (File too large)
Running under GDB and dumping the returned extents shows that indeed 3 separate extents are returned (so strangeness is not in filefrag), and then -EFBIG is returned on the second ioctl(FIEMAP) call when it tries to continue on the next extent.
247 rc = ioctl(fd, FS_IOC_FIEMAP, (unsigned long) fiemap); 248 if (rc < 0) { (gdb) p *fiemap $1 = {fm_start = 0, fm_length = 18446744073709551615, fm_flags = 1073741824, fm_mapped_extents = 3, fm_extent_count = 292, fm_reserved = 0, fm_extents = 0x7fffffff9ca0} (gdb) p fiemap->fm_extents[0] $2 = {fe_logical = 0, fe_physical = 138117120, fe_length = 1048576, fe_reserved64 = {0, 0}, fe_flags = 2147483648, fe_device = 2, fe_reserved = {0, 0}} (gdb) p fiemap->fm_extents[1] $3 = {fe_logical = 0, fe_physical = 138117120, fe_length = 1048576, fe_reserved64 = {0, 0}, fe_flags = 2147483648, fe_device = 0, fe_reserved = {0, 0}} (gdb) p fiemap->fm_extents[2] $4 = {fe_logical = 1048576, fe_physical = 140214272, fe_length = 1048576, fe_reserved64 = {0, 0}, fe_flags = 2147483648, fe_device = 3, fe_reserved = {0, 0}} : : 305 if (flags & FIEMAP_FLAG_DEVICE_ORDER) { 306 fm_ext[0].fe_logical = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length; 308 fm_ext[0].fe_device = fm_ext[i - 1].fe_device; 309 fiemap->fm_start = 0; : : 244 fiemap->fm_length = ~0ULL; 245 fiemap->fm_flags = flags; 246 fiemap->fm_extent_count = count; 247 rc = ioctl(fd, FS_IOC_FIEMAP, (unsigned long) fiemap); 248 if (rc < 0) { 251 rc = -errno; (gdb) p *fiemap $7 = {fm_start = 0, fm_length = 18446744073709551615, fm_flags = 1073741824, fm_mapped_extents = 0, fm_extent_count = 292, fm_reserved = 0, fm_extents = 0x7fffffff9ca0} (gdb) p rc $8 = -27
It looks like the LOV code needs to be taught how to iterate over a composite layout to populate the FIEMAP structure, and to be able to find the right component to restart. It may be that we need to make changes to filefrag itself, since it currently only returns the file logical offset of the end of the last printed extent and the device number it was on. This could be ambiguous in rare cases if there were two mirrors on the same OST for the same file offset and FIEMAP was interrupted just there, but it doesn't seem likely and is not a primary concern.