[LU-3219] FIEMAP does not sync data or return cached pages Created: 24/Apr/13 Updated: 15/Jul/15 Resolved: 04/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.4.0, Lustre 2.1.6, Lustre 2.4.1 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.1.6, Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch, yuc2 | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 7860 | ||||||||||||||||
| Description |
|
Artem Blagodarenko added a comment - 27/Mar/13 7:16 AM
|
| Comments |
| Comment by Peter Jones [ 24/Apr/13 ] |
| Comment by Andreas Dilger [ 24/Apr/13 ] |
|
Artem, I moved your comments into this new bug. The cp/FIEMAP problem was worked on in Your patch is forcing the OST to always get the LCK_PR lock on FIEMAP_FLAG_SYNC will cause the client to drop its whole file cache (held under LCK_PR) in the common case of "write a file then copy it from the same client". The FIEMAP_FLAG_SYNC will already cause the client to flush its own cache via ioctl_fiemap->filemap_write_and_wait(), with the patches 4477 and 4659 from The other related issue that the patch doesn't fix is that Lustre FIEMAP does not return the cached extents in memory on the client if FIEMAP_FLAG_SYNC is not set. That isn't strictly needed to make "cp" correct, since it always uses FIEMAP_FLAG_SYNC, but it might affect other FIEMAP users. It is more difficult to know what the correct action is in this case, but at a minimum it should return the unwritten pages from the local OSC extent cache. Returning unwritten pages from remote client caches is considerably more difficult, and would be racy in any case so I'm not sure that is worthwhile. It would be great if you could look at this issue as well, but it should be fixed in a separate patch from your current one. |
| Comment by Bernd Schubert [ 02/May/13 ] |
|
A q-leap customer runs into this with lustre-1.8 as well. My personal recommendation is to disable fiemap ioctls at all for now (or even for ever). Is there anything else other than filefrag and coreutils using fiemap at all? For some reasons I haven't looked into Lustre development for some time, are there plans to support SEEK_HOLE/SEEK_DATA? Thanks, |
| Comment by Andreas Dilger [ 03/May/13 ] |
|
FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block). SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches. |
| Comment by Bob Glossman (Inactive) [ 17/May/13 ] |
|
back port to b2_1 |
| Comment by Cory Spitz [ 21/May/13 ] |
|
Doesn't this bug affect 1.8.9-wc1 too? |
| Comment by Andreas Dilger [ 21/May/13 ] |
|
In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients. |
| Comment by Jian Yu [ 29/May/13 ] |
Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 sanityn test 71 failed as follows: == sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432) 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s /usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s File: `/mnt/lustre2/f71' Size: 163840 Blocks: 1 IO Block: 2097152 regular file Device: 2c54f966h/743766374d Inode: 144115373078226305 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-05-27 14:00:32.000000000 -0700 Modify: 2013-05-27 14:00:32.000000000 -0700 Change: 2013-05-27 14:00:32.000000000 -0700 34409 fd: 3 No unwritten extents, extents number 0, file size 0, original size 81920 sanityn test_71: @@@@@@ FAIL: data is not flushed from client Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue. |
| Comment by Jian Yu [ 04/Jun/13 ] |
|
sanityn test 71 failed in the following interop combinations: Lustre 2.1.6 client + 2.1.5 server: Lustre 2.1.6 client + 2.2.0 server: Lustre 2.1.6 client + 2.3.0 server: |
| Comment by Jian Yu [ 07/Jun/13 ] |
Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584 |
| Comment by Artem Blagodarenko (Inactive) [ 19/Jun/13 ] |
|
Patch for b1_8 http://review.whamcloud.com/#/c/6631/ |
| Comment by Jian Yu [ 04/Sep/13 ] |
|
Lustre client: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) sanityn test 71 failed: |
| Comment by Artem Blagodarenko (Inactive) [ 05/Sep/13 ] |
|
It looks like 2.3 server should not support this test. + [[ $server_version -lt $(version_code 1.8.10) ]] && + skip "Need MDS version at least 1.8.10" && return + + # Patch not applied to 2.2 and 2.3 branches + [[ $server_version -ge $(version_code 2.2.0) ]] && + [[ $server_version -lt $(version_code 2.4.0) ]] && + skip "Need MDS version at least 2.4.0" && return |