[LU-4380] data corruption when copy a file to a new directory (sles11sp2 only) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.4.1
Labels:
None
Environment:
server: centos 2.1.5 server OR centos 2.4.1 server
client: sles11sp2 2.4.1 client

Source can be found at github.com/jlan/lustre-nas. The tag for the client is 2.4.1-1nasC.

Severity:
3
Rank (Obsolete):
12006

Description

Users reported a data corruption problem. We have a test script to reproduce the problem.

When run in a Lustre file system with a sles11sp2 host as the remote host, the script fails (sum reports 00000). It works if the remote host is running sles11sp1 or CentOS.

— cut here for test5.sh —
#!/bin/sh

host=${1:-endeavour2}
rm -fr zz hosts
cp /etc/hosts hosts
#fsync hosts
ssh $host "cd $PWD && mkdir -p zz && cp hosts zz/"
sum hosts zz/hosts
— cut here —

Good result:
./test5.sh r301i0n0
61609 41 hosts
61609 41 zz/hosts

Bad result:
./test5.sh r401i0n2
61609 41 hosts
00000 41 zz/hosts

Notes:

If the copied file is small enough (e.g., /etc/motd), the script succeeds.
If you uncomment the fsync, the script succeeds.
When it fails, stat reports no blocks have been allocated to the zz/hosts file:

$ stat zz/hosts
File: `zz/hosts'
Size: 41820 Blocks: 0 IO Block: 2097152 regular file
Device: 914ef3a8h/2437870504d Inode: 163153538715835056 Links: 1
Access: (0644/~~rw-r~~r-) Uid: (10491/dtalcott) Gid: ( 1179/ cstaff)
Access: 2013-12-12 09:24:46.000000000 -0800
Modify: 2013-12-12 09:24:46.000000000 -0800
Change: 2013-12-12 09:24:46.000000000 -0800

If you run in an NFS file system, the script usually succeeds, but sometimes reports a no such file error on the sum of zz/hosts. After a few seconds, though, the file appears, with the correct sum. (Typical NFS behavior.)
Acts the same on nbp7 and nbp8.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

LU4380.dbg.20121230.resend.tgz
2.17 MB
31/Dec/13 6:33 PM
LU4380.dbg.20121230.tgz
2.17 MB
30/Dec/13 8:33 PM
LU4380.dbg.20131224
2.76 MB
24/Dec/13 9:03 PM
LU-4380-debug.patch
0.5 kB
24/Dec/13 2:32 AM

Issue Links

duplicates

LU-3219 FIEMAP does not sync data or return cached pages

Resolved

Activity

[LU-4380] data corruption when copy a file to a new directory (sles11sp2 only)

Jay Lan (Inactive) added a comment - 24/Dec/13 9:03 PM

Attached is the debug output Niu requested. I did not run the test with Niu's patch though since I need to get authorization to put in new binary into production system.

Jay Lan (Inactive) added a comment - 24/Dec/13 9:03 PM Attached is the debug output Niu requested. I did not run the test with Niu's patch though since I need to get authorization to put in new binary into production system.

Niu Yawei (Inactive) added a comment - 24/Dec/13 2:32 AM

It's better to have this patch applied when collecting debug logs.

Niu Yawei (Inactive) added a comment - 24/Dec/13 2:32 AM It's better to have this patch applied when collecting debug logs.

Niu Yawei (Inactive) added a comment - 24/Dec/13 2:10 AM

Jay, could you try to reproduce with D_TRACE log enabled, let's try to see if sync flag is specified in fiemap call from the lustre log?

echo +trace > /proc/sys/lnet/debug
lctl debug_daemon start $tmpfile 300
lctl mark "=== cp test ==="
cp test
lctl mark "=== cp test end ==="
lctl debug_daemon stop
lctl debug_file $tmpfile $logfile
attach the $logfile in this ticket.

Niu Yawei (Inactive) added a comment - 24/Dec/13 2:10 AM Jay, could you try to reproduce with D_TRACE log enabled, let's try to see if sync flag is specified in fiemap call from the lustre log? echo +trace > /proc/sys/lnet/debug lctl debug_daemon start $tmpfile 300 lctl mark "=== cp test ===" cp test lctl mark "=== cp test end ===" lctl debug_daemon stop lctl debug_file $tmpfile $logfile attach the $logfile in this ticket.

Jay Lan (Inactive) added a comment - 19/Dec/13 6:52 PM

Niu, ~~LU-2580~~ refered to fixes to ~~LU-2267~~ and ~~LU-2286~~. We have both patches in our 2.4.1 branch.

Jay Lan (Inactive) added a comment - 19/Dec/13 6:52 PM Niu, LU-2580 refered to fixes to LU-2267 and LU-2286 . We have both patches in our 2.4.1 branch.

Bob Glossman (Inactive) added a comment - 19/Dec/13 4:06 PM

Tried backing down to the -6.23.1 coreutils version. Still couldn't make the problem happen. Looks like the binary cp is identical between the 2 versions anyway, I checked.
Package diffs must be elsewhere.

Bob Glossman (Inactive) added a comment - 19/Dec/13 4:06 PM Tried backing down to the -6.23.1 coreutils version. Still couldn't make the problem happen. Looks like the binary cp is identical between the 2 versions anyway, I checked. Package diffs must be elsewhere.

Bob Glossman (Inactive) added a comment - 19/Dec/13 3:47 PM

I have coreutils-8.12-6.25.29.1 on sles11sp2.

Bob Glossman (Inactive) added a comment - 19/Dec/13 3:47 PM I have coreutils-8.12-6.25.29.1 on sles11sp2.

Niu Yawei (Inactive) added a comment - 19/Dec/13 7:07 AM

I checked the source code of coreutils-8.12 from gnu.org, looks FIEMAP_FLAG_SYNC is always set for reading extent, not sure if there is any difference with the copy on sles11sp2. (not sure where to get the source code of coreutils for sles11sp2)

Bob, I guess your coreutils version isn't same as Jay's, that's why you can't reproduce the problem. Could you try coreutils-8.12-6.23.1?

Niu Yawei (Inactive) added a comment - 19/Dec/13 7:07 AM I checked the source code of coreutils-8.12 from gnu.org, looks FIEMAP_FLAG_SYNC is always set for reading extent, not sure if there is any difference with the copy on sles11sp2. (not sure where to get the source code of coreutils for sles11sp2) Bob, I guess your coreutils version isn't same as Jay's, that's why you can't reproduce the problem. Could you try coreutils-8.12-6.23.1?

Niu Yawei (Inactive) added a comment - 19/Dec/13 5:21 AM

This looks like the same problem as ~~LU-2580~~.

Some data of the source file 'host' is still cached on client but not flushed back to OST, so the st_blocks reproted by stat is less than actual file size, 'cp' then think that is a sparse file and tries to copy only the extents get by fiemap ioctl.

So what we need to figure out is: If the 'cp' in sles11sp2 calls fiemap with FIEMAP_FLAG_SYNC flag to make sure all the cached data flush back before getting extents?

Niu Yawei (Inactive) added a comment - 19/Dec/13 5:21 AM This looks like the same problem as LU-2580 . Some data of the source file 'host' is still cached on client but not flushed back to OST, so the st_blocks reproted by stat is less than actual file size, 'cp' then think that is a sparse file and tries to copy only the extents get by fiemap ioctl. So what we need to figure out is: If the 'cp' in sles11sp2 calls fiemap with FIEMAP_FLAG_SYNC flag to make sure all the cached data flush back before getting extents?

Jay Lan (Inactive) added a comment - 19/Dec/13 1:23 AM

Hi Bob,

'/bin/cp' command is packaged in coreutils in sles11sp2.
My version is coreutils-8.12-6.23.1. What version is yours?

Jay Lan (Inactive) added a comment - 19/Dec/13 1:23 AM Hi Bob, '/bin/cp' command is packaged in coreutils in sles11sp2. My version is coreutils-8.12-6.23.1. What version is yours?

Jay Lan (Inactive) added a comment - 18/Dec/13 9:37 PM

Here is the second part of Dale's reply in response to Andreas' strace request. I did not include the second part in first attemp. He actually did try with --sparse=never.

== quote on ==
So, there are two bugs here. First, Lustre did not update st_blocks for the source file soon enough. Second, sles11sp2's cp is too "smart" for its own good.

FWIW:

I used the sles11sp1 version of cp under sles11sp2 and it produced a correct copy, in spite of the bad st_blocks value.

I tried adding the --sparse=never option to cp to see if I could get it to ignore st_blocks. That made it even stupider: It copied the 7 MiB as before, then explicitly filled the rest of st_size with zeros.
== quote off ==

Jay Lan (Inactive) added a comment - 18/Dec/13 9:37 PM Here is the second part of Dale's reply in response to Andreas' strace request. I did not include the second part in first attemp. He actually did try with --sparse=never. == quote on == So, there are two bugs here. First, Lustre did not update st_blocks for the source file soon enough. Second, sles11sp2's cp is too "smart" for its own good. FWIW: I used the sles11sp1 version of cp under sles11sp2 and it produced a correct copy, in spite of the bad st_blocks value. I tried adding the --sparse=never option to cp to see if I could get it to ignore st_blocks. That made it even stupider: It copied the 7 MiB as before, then explicitly filled the rest of st_size with zeros. == quote off ==

Bob Glossman (Inactive) added a comment - 18/Dec/13 9:26 PM

just to collect some additional data could you please add the --sparse=never option to your cp commands, see if that avoids failures, and again get straces on the cp.

Bob Glossman (Inactive) added a comment - 18/Dec/13 9:26 PM just to collect some additional data could you please add the --sparse=never option to your cp commands, see if that avoids failures, and again get straces on the cp.

People

Assignee:: Bob Glossman (Inactive)

Reporter:: Jay Lan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Dec/13 7:24 PM

Updated:: 13/Feb/14 6:35 PM

Resolved:: 14/Jan/14 11:12 PM