[LU-2580] cp with FIEMAP support creates completely sparse file Created: 07/Jan/13  Updated: 24/Apr/13  Resolved: 05/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Kit Westneat (Inactive) Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: LB
Environment:

SLES 11 SP2 (client), Lustre 2.1.2 RHEL6 (server)


Attachments: Text File stats_and_md5sums.txt    
Issue Links:
Related
is related to LU-417 block usage is reported as zero by st... Resolved
is related to LU-3219 FIEMAP does not sync data or return c... Resolved
Severity: 2
Rank (Obsolete): 6020

 Description   

We are seeing an issue at KIT where cp will occasionally use the FIEMAP extension to create a completely sparse file instead of actually copying the file. It seems to occur under a workload involving creating and deleting many files at once. It only involves a single client though, it's not a parallel workload.

Relevant strace from 'bad' cp:
ioctl(3, 0xc020660b, 0x7fff392c0950) = 0
ftruncate(4, 12853) = 0

strace from 'good' cp:
read(3, "#!/bin/bash -u\n\n#localisation\nex"..., 2097152) = 12853
write(4, "#!/bin/bash -u\n\n#localisation\nex"..., 12853) = 12853
read(3, "", 2097152) = 0

The strace didn't print the stat block information, but I'm assuming the st_blocks == 0 in the bad one. I will ask the customer to get a full strace -v to confirm, but it appears to be something similar to LU-417?



 Comments   
Comment by Kit Westneat (Inactive) [ 07/Jan/13 ]

There is a 41MB (800MB uncompressed) client debug log from the event. Would it be useful to upload it somewhere?

Comment by Peter Jones [ 07/Jan/13 ]

Thanks for the ticket Kit.

Comment by Andreas Dilger [ 09/Jan/13 ]

Kit, is this only happening when trying to copy a 1-stripe file that was recently created? Is the file created on the same node or a remote node? What does "stat" report on the file before it is copied?

Comment by Kit Westneat (Inactive) [ 09/Jan/13 ]

Here is the stat, stracing the cp reports the same:
File: `/work/ws/jl95/simulation/OSC/OSC_traps_temperatur2_kopie/calc_series_OSC_multi_neu'
Size: 12899 Blocks: 1 IO Block: 2097152 regular file
Device: 24ce0ea6h/617483942d Inode: 144115253070874816 Links: 1
Access: (0755/-rwxr-xr-x) Uid: (28181/ jl95) Gid: (11700/ jl00)
Access: 2013-01-09 13:28:34.000000000 +0100
Modify: 2013-01-09 13:28:34.000000000 +0100
Change: 2013-01-09 13:28:34.000000000 +0100
Birth: -

Response from customer:

> is this only happening when trying to copy a 1-stripe file that was recently created?
Yes, it seems. The file was recently created and has stripe count 1. (The file system
has defaultr stripe count 4 but since the user creates so many files I advised him to
use stripe count 1.

> Is the file created on the same node or a remote node?
The file is created on a local disk of the same node.

> What does "stat" report on the file before it is copied?
I just asked the user to run his reproducer again and create that data. (It might happen that
with the stat the problem might not appear. With a "sleep 5" the problem does not appear most
of the times. After the copy we check the source and the copied file with md5sum and when the
problem appears the md5sum is different.)

Comment by Kit Westneat (Inactive) [ 17/Jan/13 ]

stats and md5sums of both good and bad cps. In the bad cp, the file only reports 1 used block. In the good one, the file reports 32 blocks. straces confirm that's what is seen by cp.

Comment by Andreas Dilger [ 17/Jan/13 ]

What version of fileutils is in use here? Was it part of the distro, or upgraded afterward?

Comment by Kit Westneat (Inactive) [ 22/Jan/13 ]

From KIT: They are using normal fileutils (which are part of coreutils) of the SLES11 SP2 distribution. coreutils version is 8.12 and release is 6.23.1. (Source RPM is coreutils-8.12-6.23.1.src.rpm)

Comment by Kalpak Shah (Inactive) [ 20/Feb/13 ]

I don't think this issue is related to FIEMAP. stat reported st_blocks=1 for the file and a size of 12899 bytes. So cp correctly called the FIEMAP ioctl.

The problem seems to be Lustre reporting wrong number of blocks on a recently created/written file. This fix leads stat to report st_blocks=1 instead of 0 - http://git.whamcloud.com/?p=fs/lustre-release.git;a=commitdiff;h=829845ac9ddbdfd170de215742c033ea1102db3e;hp=fc4b46df111bbf9d2207265d18b3f0d72f49502c

Comment by Kalpak Shah (Inactive) [ 21/Feb/13 ]

Further regarding the ftruncate that we see in the strace (instead of the fseek that I was expecting) - even though Lustre says st_blocks=1, fiemap ioctl says that no blocks are allocated leading to the ftruncate call with the size of the file.

On SLES11 SP2 with coreutils-8.12-6.19.1, looks like cp is always setting the FIEMAP_FLAG_SYNC flag as well.

Comment by Andreas Dilger [ 21/Feb/13 ]

Kalpak, AFAIK the st_blocks value is only used to determine whether the file is sparse (st_blocks < st_size / 512) or dense (st_blocks >= st_size / 512). For dense files they are copied via "while (read() > 0) write()", and for sparse files newer "cp" copies only the list of extents returned by FIEMAP. In both cases, my understanding is that st_blocks is not used for determining how much data is copied.

The problem, as I see it, is that Lustre FIEMAP (which only returns something useful to "cp" for single-striped files) does not return FIEMAP_EXTENT_DELALLOC extents for pages that are only in the client cache and not on the OST yet. "cp" should be using FIEMAP_FLAG_SYNC and causing all of the cached extents to be flushed, but somehow this isn't happening.

Comment by Peter Jones [ 27/Feb/13 ]

Could you please clarify as to what versions of Lustre (and any patches running) that are being used here? You mention that it is Lustre 2.1.2 servers but what version of Lustre is being used on the client?

Comment by Kalpak Shah (Inactive) [ 04/Mar/13 ]

Peter, the clients are running Lustre 2.3 on SLES11 SP2.

Comment by Peter Jones [ 04/Mar/13 ]

Thanks Kalpak. With any patches applied?

Comment by Kalpak Shah (Inactive) [ 05/Mar/13 ]

Update from KIT: With Lustre 2.3.0 on the client and patches 4477 and 4659 from LU-2367 and LU-2286, the issue cannot be reproduced.

LU-2367 fixes a race in unplugging the IO queue which can affect flush and fsync - and "cp" always calls FIEMAP with the SYNC flag set causing the cached extents to be flushed. LU-2286 fixes a bug where an extent does not get flushed to disk until the next write to the file occurs. So it does seem logical that the issue is not being reproduced with these 2 patches applied.

Comment by Peter Jones [ 05/Mar/13 ]

ok thanks for the update. That explains why we have been unable to reproduce this issue on the latest 2.4 code. I will close out this ticket.

Comment by Cory Spitz [ 28/Mar/13 ]

This bug is still applicable to 2.1 when using cp built from coreutils 8.12, right? [I can't confirm that, but I think we're seeing this on 2.2] If so and since b2_1 is still the current maintenance branch, do we want to land a fix there?

Generated at Sat Feb 10 01:26:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.