Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2580

cp with FIEMAP support creates completely sparse file

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.3.0, Lustre 2.4.0
    • SLES 11 SP2 (client), Lustre 2.1.2 RHEL6 (server)
    • 2
    • 6020

    Description

      We are seeing an issue at KIT where cp will occasionally use the FIEMAP extension to create a completely sparse file instead of actually copying the file. It seems to occur under a workload involving creating and deleting many files at once. It only involves a single client though, it's not a parallel workload.

      Relevant strace from 'bad' cp:
      ioctl(3, 0xc020660b, 0x7fff392c0950) = 0
      ftruncate(4, 12853) = 0

      strace from 'good' cp:
      read(3, "#!/bin/bash -u\n\n#localisation\nex"..., 2097152) = 12853
      write(4, "#!/bin/bash -u\n\n#localisation\nex"..., 12853) = 12853
      read(3, "", 2097152) = 0

      The strace didn't print the stat block information, but I'm assuming the st_blocks == 0 in the bad one. I will ask the customer to get a full strace -v to confirm, but it appears to be something similar to LU-417?

      Attachments

        Issue Links

          Activity

            [LU-2580] cp with FIEMAP support creates completely sparse file
            spitzcor Cory Spitz added a comment -

            This bug is still applicable to 2.1 when using cp built from coreutils 8.12, right? [I can't confirm that, but I think we're seeing this on 2.2] If so and since b2_1 is still the current maintenance branch, do we want to land a fix there?

            spitzcor Cory Spitz added a comment - This bug is still applicable to 2.1 when using cp built from coreutils 8.12, right? [I can't confirm that, but I think we're seeing this on 2.2] If so and since b2_1 is still the current maintenance branch, do we want to land a fix there?
            pjones Peter Jones added a comment -

            ok thanks for the update. That explains why we have been unable to reproduce this issue on the latest 2.4 code. I will close out this ticket.

            pjones Peter Jones added a comment - ok thanks for the update. That explains why we have been unable to reproduce this issue on the latest 2.4 code. I will close out this ticket.

            Update from KIT: With Lustre 2.3.0 on the client and patches 4477 and 4659 from LU-2367 and LU-2286, the issue cannot be reproduced.

            LU-2367 fixes a race in unplugging the IO queue which can affect flush and fsync - and "cp" always calls FIEMAP with the SYNC flag set causing the cached extents to be flushed. LU-2286 fixes a bug where an extent does not get flushed to disk until the next write to the file occurs. So it does seem logical that the issue is not being reproduced with these 2 patches applied.

            kalpak Kalpak Shah (Inactive) added a comment - Update from KIT: With Lustre 2.3.0 on the client and patches 4477 and 4659 from LU-2367 and LU-2286 , the issue cannot be reproduced. LU-2367 fixes a race in unplugging the IO queue which can affect flush and fsync - and "cp" always calls FIEMAP with the SYNC flag set causing the cached extents to be flushed. LU-2286 fixes a bug where an extent does not get flushed to disk until the next write to the file occurs. So it does seem logical that the issue is not being reproduced with these 2 patches applied.
            pjones Peter Jones added a comment -

            Thanks Kalpak. With any patches applied?

            pjones Peter Jones added a comment - Thanks Kalpak. With any patches applied?

            Peter, the clients are running Lustre 2.3 on SLES11 SP2.

            kalpak Kalpak Shah (Inactive) added a comment - Peter, the clients are running Lustre 2.3 on SLES11 SP2.
            pjones Peter Jones added a comment -

            Could you please clarify as to what versions of Lustre (and any patches running) that are being used here? You mention that it is Lustre 2.1.2 servers but what version of Lustre is being used on the client?

            pjones Peter Jones added a comment - Could you please clarify as to what versions of Lustre (and any patches running) that are being used here? You mention that it is Lustre 2.1.2 servers but what version of Lustre is being used on the client?

            Kalpak, AFAIK the st_blocks value is only used to determine whether the file is sparse (st_blocks < st_size / 512) or dense (st_blocks >= st_size / 512). For dense files they are copied via "while (read() > 0) write()", and for sparse files newer "cp" copies only the list of extents returned by FIEMAP. In both cases, my understanding is that st_blocks is not used for determining how much data is copied.

            The problem, as I see it, is that Lustre FIEMAP (which only returns something useful to "cp" for single-striped files) does not return FIEMAP_EXTENT_DELALLOC extents for pages that are only in the client cache and not on the OST yet. "cp" should be using FIEMAP_FLAG_SYNC and causing all of the cached extents to be flushed, but somehow this isn't happening.

            adilger Andreas Dilger added a comment - Kalpak, AFAIK the st_blocks value is only used to determine whether the file is sparse (st_blocks < st_size / 512) or dense (st_blocks >= st_size / 512). For dense files they are copied via "while (read() > 0) write()", and for sparse files newer "cp" copies only the list of extents returned by FIEMAP. In both cases, my understanding is that st_blocks is not used for determining how much data is copied. The problem, as I see it, is that Lustre FIEMAP (which only returns something useful to "cp" for single-striped files) does not return FIEMAP_EXTENT_DELALLOC extents for pages that are only in the client cache and not on the OST yet. "cp" should be using FIEMAP_FLAG_SYNC and causing all of the cached extents to be flushed, but somehow this isn't happening.

            Further regarding the ftruncate that we see in the strace (instead of the fseek that I was expecting) - even though Lustre says st_blocks=1, fiemap ioctl says that no blocks are allocated leading to the ftruncate call with the size of the file.

            On SLES11 SP2 with coreutils-8.12-6.19.1, looks like cp is always setting the FIEMAP_FLAG_SYNC flag as well.

            kalpak Kalpak Shah (Inactive) added a comment - Further regarding the ftruncate that we see in the strace (instead of the fseek that I was expecting) - even though Lustre says st_blocks=1, fiemap ioctl says that no blocks are allocated leading to the ftruncate call with the size of the file. On SLES11 SP2 with coreutils-8.12-6.19.1, looks like cp is always setting the FIEMAP_FLAG_SYNC flag as well.

            I don't think this issue is related to FIEMAP. stat reported st_blocks=1 for the file and a size of 12899 bytes. So cp correctly called the FIEMAP ioctl.

            The problem seems to be Lustre reporting wrong number of blocks on a recently created/written file. This fix leads stat to report st_blocks=1 instead of 0 - http://git.whamcloud.com/?p=fs/lustre-release.git;a=commitdiff;h=829845ac9ddbdfd170de215742c033ea1102db3e;hp=fc4b46df111bbf9d2207265d18b3f0d72f49502c

            kalpak Kalpak Shah (Inactive) added a comment - I don't think this issue is related to FIEMAP. stat reported st_blocks=1 for the file and a size of 12899 bytes. So cp correctly called the FIEMAP ioctl. The problem seems to be Lustre reporting wrong number of blocks on a recently created/written file. This fix leads stat to report st_blocks=1 instead of 0 - http://git.whamcloud.com/?p=fs/lustre-release.git;a=commitdiff;h=829845ac9ddbdfd170de215742c033ea1102db3e;hp=fc4b46df111bbf9d2207265d18b3f0d72f49502c

            From KIT: They are using normal fileutils (which are part of coreutils) of the SLES11 SP2 distribution. coreutils version is 8.12 and release is 6.23.1. (Source RPM is coreutils-8.12-6.23.1.src.rpm)

            kitwestneat Kit Westneat (Inactive) added a comment - From KIT: They are using normal fileutils (which are part of coreutils) of the SLES11 SP2 distribution. coreutils version is 8.12 and release is 6.23.1. (Source RPM is coreutils-8.12-6.23.1.src.rpm)

            People

              pjones Peter Jones
              kitwestneat Kit Westneat (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: