[LU-17124] fiemap FIEMAP_FLAG_SYNC flag expects filemap_write_and_wait() or similar Created: 18/Sep/23 Updated: 15/Nov/23 Resolved: 28/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shaun Tancheff | Assignee: | Shaun Tancheff |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
fiemap FIEMAP_FLAG_SYNC can race while client is writing data to disk In such a case fiemap() call returns that the data is not on disk (no data for the range) and cp can just truncates (or sparse fill) based on the size of the file/extent. strace shows that FIEMAP_FLAG_SYNC was sent. Further and the user reports that a 'sync; cp <blah>' does not fail and a newer cp that uses copy_file_range() also does not fail. Looking further FIEMAP_FLAG_SYNC expects the data to be on disk aka filemap_write_and_wait() not just filemap_fdatawrite() |
| Comments |
| Comment by Gerrit Updater [ 18/Sep/23 ] |
|
"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52402 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52402/ |
| Comment by Peter Jones [ 28/Sep/23 ] |
|
Landed for 2.16 |
| Comment by Lukasz Flis [ 14/Nov/23 ] |
|
Cyfronet here, Server: 2.15.3 We have observed the same issue prior applying the patch. When copying 4096 files from one directory to another (on the same fs), in parallel using 32 processes, some copies where corrupted. Some files were truncated more or less, around 10-30 destination files corrupted in total. Having applied the patch we observe less corrupted files and less bytes are missing. It seems that solution doesn't fix the problem completly, or maybe we miss some other patches between 2.15.3 and 2.16.0 |
| Comment by Lukasz Flis [ 14/Nov/23 ] |
|
This is how we copy the files:
find ./source_dir -name *.exr | xargs -P32 -I'{}' cp -v {} ./dest_dir
Parameter -P32 keeps 32 processes running in parallel |
| Comment by Shaun Tancheff [ 14/Nov/23 ] |
|
If you are copying from Lustre to Lustre on the same file system then I do not see this patch would have any obvious effect on your use case. I do not see fiemap() being involved on el9.2 when copy_file_range() can be used. So I think your find | xargs -P32 cp <args> case is may be exposing a separate bug. If you could capture strace(s) of the cp -v and provide / attach a 'good' sample and a 'bad' sample we may be able to see what is happening. |
| Comment by Lukasz Flis [ 14/Nov/23 ] |
|
Two traces in the attachment, both captured for the same file #good
...
uname({sysname="Linux", nodename="t0006", ...}) = 0
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 62924467
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0
...
#bad
...
uname({sysname="Linux", nodename="t0006", ...}) = 0
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 19005440
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 1441792
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 0 <= short read?
...
|
| Comment by Lukasz Flis [ 14/Nov/23 ] |
|
Quick update.
/* * gcc -c -Wall -Werror -fpic ./this.c -o cfr.o * gcc -ldl -shared -o ./cfr.so ./cfr.o * LD_PRELOAD=/full_path_to/cfr.so cp A B * * */ #define _GNU_SOURCE 1 #include <dlfcn.h> #include <unistd.h> #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> typedef ssize_t *(*cfr_t)(int, loff_t*, int, loff_t*, size_t, unsigned int); static cfr_t p = NULL; ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags) { fsync(fd_out); ssize_t r=-1; if (p == NULL) { p = dlsym(RTLD_NEXT, "copy_file_range"); if (p == NULL) { /* Error handling */ return r; } } r = (ssize_t) p(fd_in, off_in, fd_out, off_out, len,flags); return r; }
|
| Comment by Gerrit Updater [ 15/Nov/23 ] |
|
"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53140 |
| Comment by Shaun Tancheff [ 15/Nov/23 ] |
|
Please try the patch and see if the sync() occurs early enough to resolve the corruption you are finding. If not we may need to implement a file-system specific copy_file_range() Thanks! |
| Comment by Lukasz Flis [ 15/Nov/23 ] |
|
Thank you for the patch. We tested the changes and unfortunately we still see corrupted destnation files.
|
| Comment by James A Simmons [ 15/Nov/23 ] |
|
A ticket exist for copy_file_range(). I just never got the cycles to implement for Lustre. Also RHEL7 doesn't support a proper hook for copy_file_range. |