[LU-3219] FIEMAP does not sync data or return cached pages Created: 24/Apr/13  Updated: 15/Jul/15  Resolved: 04/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.4.0, Lustre 2.1.6, Lustre 2.4.1
Fix Version/s: Lustre 2.4.0, Lustre 2.1.6, Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Artem Blagodarenko (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: patch, yuc2

Issue Links:
Duplicate
is duplicated by LU-4380 data corruption when copy a file to a... Resolved
Related
is related to LU-2580 cp with FIEMAP support creates comple... Resolved
Severity: 3
Rank (Obsolete): 7860

 Description   

Artem Blagodarenko added a comment - 27/Mar/13 7:16 AM

We have bug, that looks like this.

After coping of a file with 'cp' to another location in the same fs some empty gaps happened.

Comparing the good to the bad file, the bad file has a 4096-byte segment of zeros at block offset 6799 of the file, and the last portion of the file (following the last even 1MB multiple) is also zeros. Looking at the bad file's OST object (stripe count is 1), the zero'd regions of the file are holes - no blocks allocated for those portions. The block allocations for the bad file go from 0-6798, block 6799 is missing, then 6800-8703, blocks 8704-8745 are missing. (8704 blocks is 34*1048576; the bad file ends at the last even 1MB boundary).

The bad file is missing block 6799, and blocks 8704-8745

Extents for good file

EXTENTS:
(ETB0):554244449, (0-127):554255616-554255743, (128-767):554257024-554257663, (768-1535):554257920-554258687, (1536-
4607):554258944-554262015, (4608-5247):554262528-554263167, (5248-5375):554255744-554255871, (5376-5887):554263296-5
54263807, (5888-6015):554255872-554255999, (6016-6143):554258688-554258815, (6144-6271):554262144-554262271, (6272-6
527):554256000-554256255, (6528-6655):554262016-554262143, (6656-7039):554256256-554256639, (7040-7167):554262272-55
4262399, (7168-7295):554263168-554263295, (7296-7423):554263936-554264063, (7424-7679):554256640-554256895, (7680-78
07):554262400-554262527, (7808-7935):554263808-554263935, (7936-8063):554256896-554257023, (8064-8319):554257664-554
257919, (8320-8447):554258816-554258943, (8448-8745):554264064-554264361

Extents for broken file:

EXTENTS:
(0-6798):538994432-539001230, (6800-8703):539001232-539003135

Lustre client version on this system is 2.2.

http://review.whamcloud.com/#change,6127



 Comments   
Comment by Peter Jones [ 24/Apr/13 ]

http://review.whamcloud.com/#change,6127

Comment by Andreas Dilger [ 24/Apr/13 ]

Artem, I moved your comments into this new bug. The cp/FIEMAP problem was worked on in LU-2580, and at least this solved the problem with "cp" missing the cached extents on the client. It doesn't fix the problem with cached extents on other clients, as your change does, so I'd still like to get your patch landed. Some issues that came up when Oleg and I looked at the patch.

Your patch is forcing the OST to always get the LCK_PR lock on FIEMAP_FLAG_SYNC will cause the client to drop its whole file cache (held under LCK_PR) in the common case of "write a file then copy it from the same client". The FIEMAP_FLAG_SYNC will already cause the client to flush its own cache via ioctl_fiemap->filemap_write_and_wait(), with the patches 4477 and 4659 from LU-2367 and LU-2286. It would be better to only set the OBD_FL_SRVLOCK flag if the client cannot match a [0,EOF] LCK_PR lock locally. That would avoid flushing the client cache in a common case (LCK_PR will match the local LCK_PW on the client and VFS filemap_write_and_wait() will ensure data is written to the OST), but still ensure correctness if the file was written by multiple clients.

The other related issue that the patch doesn't fix is that Lustre FIEMAP does not return the cached extents in memory on the client if FIEMAP_FLAG_SYNC is not set. That isn't strictly needed to make "cp" correct, since it always uses FIEMAP_FLAG_SYNC, but it might affect other FIEMAP users. It is more difficult to know what the correct action is in this case, but at a minimum it should return the unwritten pages from the local OSC extent cache. Returning unwritten pages from remote client caches is considerably more difficult, and would be racy in any case so I'm not sure that is worthwhile. It would be great if you could look at this issue as well, but it should be fixed in a separate patch from your current one.

Comment by Bernd Schubert [ 02/May/13 ]

A q-leap customer runs into this with lustre-1.8 as well. My personal recommendation is to disable fiemap ioctls at all for now (or even for ever). Is there anything else other than filefrag and coreutils using fiemap at all? For some reasons I haven't looked into Lustre development for some time, are there plans to support SEEK_HOLE/SEEK_DATA?

Thanks,
Венедикт

Comment by Andreas Dilger [ 03/May/13 ]

FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block).

SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches.

Comment by Bob Glossman (Inactive) [ 17/May/13 ]

back port to b2_1
http://review.whamcloud.com/6377

Comment by Cory Spitz [ 21/May/13 ]

Doesn't this bug affect 1.8.9-wc1 too?

Comment by Andreas Dilger [ 21/May/13 ]

In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients.

Comment by Jian Yu [ 29/May/13 ]

back port to b2_1
http://review.whamcloud.com/6377

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/191 (2.1.5)
Distro/Arch: RHEL6.4/x86_64

sanityn test 71 failed as follows:

== sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432)
1+0 records in
1+0 records out
40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s
/usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found
1+0 records in
1+0 records out
40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s
  File: `/mnt/lustre2/f71'
  Size: 163840    	Blocks: 1          IO Block: 2097152 regular file
Device: 2c54f966h/743766374d	Inode: 144115373078226305  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-05-27 14:00:32.000000000 -0700
Modify: 2013-05-27 14:00:32.000000000 -0700
Change: 2013-05-27 14:00:32.000000000 -0700
34409
fd: 3
No unwritten extents, extents number 0, file size 0, original size 81920
 sanityn test_71: @@@@@@ FAIL: data is not flushed from client

Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
Lustre b2_2 server build: http://build.whamcloud.com/job/lustre-b2_2/17 (2.2.0)
sanityn test 71 also failed:
https://maloo.whamcloud.com/test_sets/5ebb0a36-c7c2-11e2-9f90-52540035b04c

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
Lustre b2_3 server build: http://build.whamcloud.com/job/lustre-b2_3/41 (2.3.0)
sanityn test 71 also failed:
https://maloo.whamcloud.com/test_sets/f9fe3b92-c7f6-11e2-ae6b-52540035b04c

Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

Comment by Jian Yu [ 04/Jun/13 ]

sanityn test 71 failed in the following interop combinations:

Lustre 2.1.6 client + 2.1.5 server:
https://maloo.whamcloud.com/test_sets/d8783b88-cc0d-11e2-9cc0-52540035b04c

Lustre 2.1.6 client + 2.2.0 server:
https://maloo.whamcloud.com/test_sets/e1c7d58e-cc23-11e2-9cc0-52540035b04c

Lustre 2.1.6 client + 2.3.0 server:
https://maloo.whamcloud.com/test_sets/3aca7922-cc09-11e2-9cc0-52540035b04c

Comment by Jian Yu [ 07/Jun/13 ]

Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584
Patch for Lustre master branch: http://review.whamcloud.com/6585 (which is also needed on Lustre b2_4 branch)

Comment by Artem Blagodarenko (Inactive) [ 19/Jun/13 ]

Patch for b1_8 http://review.whamcloud.com/#/c/6631/

Comment by Jian Yu [ 04/Sep/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Lustre server: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)

sanityn test 71 failed:
https://maloo.whamcloud.com/test_sets/08bb9120-14ff-11e3-ba63-52540035b04c

Comment by Artem Blagodarenko (Inactive) [ 05/Sep/13 ]

It looks like 2.3 server should not support this test.
Should we add checks like check added to b1_8 to master?

+       [[ $server_version -lt $(version_code 1.8.10) ]] &&
+               skip "Need MDS version at least 1.8.10" && return
+
+       # Patch not applied to 2.2 and 2.3 branches
+       [[ $server_version -ge $(version_code 2.2.0) ]] &&
+       [[ $server_version -lt $(version_code 2.4.0) ]] &&
+               skip "Need MDS version at least 2.4.0" && return
Generated at Sat Feb 10 01:31:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.