Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3219

FIEMAP does not sync data or return cached pages

Details

    • 3
    • 7860

    Description

      Artem Blagodarenko added a comment - 27/Mar/13 7:16 AM

      We have bug, that looks like this.

      After coping of a file with 'cp' to another location in the same fs some empty gaps happened.

      Comparing the good to the bad file, the bad file has a 4096-byte segment of zeros at block offset 6799 of the file, and the last portion of the file (following the last even 1MB multiple) is also zeros. Looking at the bad file's OST object (stripe count is 1), the zero'd regions of the file are holes - no blocks allocated for those portions. The block allocations for the bad file go from 0-6798, block 6799 is missing, then 6800-8703, blocks 8704-8745 are missing. (8704 blocks is 34*1048576; the bad file ends at the last even 1MB boundary).

      The bad file is missing block 6799, and blocks 8704-8745

      Extents for good file

      EXTENTS:
      (ETB0):554244449, (0-127):554255616-554255743, (128-767):554257024-554257663, (768-1535):554257920-554258687, (1536-
      4607):554258944-554262015, (4608-5247):554262528-554263167, (5248-5375):554255744-554255871, (5376-5887):554263296-5
      54263807, (5888-6015):554255872-554255999, (6016-6143):554258688-554258815, (6144-6271):554262144-554262271, (6272-6
      527):554256000-554256255, (6528-6655):554262016-554262143, (6656-7039):554256256-554256639, (7040-7167):554262272-55
      4262399, (7168-7295):554263168-554263295, (7296-7423):554263936-554264063, (7424-7679):554256640-554256895, (7680-78
      07):554262400-554262527, (7808-7935):554263808-554263935, (7936-8063):554256896-554257023, (8064-8319):554257664-554
      257919, (8320-8447):554258816-554258943, (8448-8745):554264064-554264361

      Extents for broken file:

      EXTENTS:
      (0-6798):538994432-539001230, (6800-8703):539001232-539003135

      Lustre client version on this system is 2.2.

      http://review.whamcloud.com/#change,6127

      Attachments

        Issue Links

          Activity

            [LU-3219] FIEMAP does not sync data or return cached pages
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited Patch for b1_8 http://review.whamcloud.com/#/c/6631/
            yujian Jian Yu added a comment - - edited

            Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584
            Patch for Lustre master branch: http://review.whamcloud.com/6585 (which is also needed on Lustre b2_4 branch)

            yujian Jian Yu added a comment - - edited Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue. Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584 Patch for Lustre master branch: http://review.whamcloud.com/6585 (which is also needed on Lustre b2_4 branch)
            yujian Jian Yu added a comment -

            sanityn test 71 failed in the following interop combinations:

            Lustre 2.1.6 client + 2.1.5 server:
            https://maloo.whamcloud.com/test_sets/d8783b88-cc0d-11e2-9cc0-52540035b04c

            Lustre 2.1.6 client + 2.2.0 server:
            https://maloo.whamcloud.com/test_sets/e1c7d58e-cc23-11e2-9cc0-52540035b04c

            Lustre 2.1.6 client + 2.3.0 server:
            https://maloo.whamcloud.com/test_sets/3aca7922-cc09-11e2-9cc0-52540035b04c

            yujian Jian Yu added a comment - sanityn test 71 failed in the following interop combinations: Lustre 2.1.6 client + 2.1.5 server: https://maloo.whamcloud.com/test_sets/d8783b88-cc0d-11e2-9cc0-52540035b04c Lustre 2.1.6 client + 2.2.0 server: https://maloo.whamcloud.com/test_sets/e1c7d58e-cc23-11e2-9cc0-52540035b04c Lustre 2.1.6 client + 2.3.0 server: https://maloo.whamcloud.com/test_sets/3aca7922-cc09-11e2-9cc0-52540035b04c
            yujian Jian Yu added a comment -

            back port to b2_1
            http://review.whamcloud.com/6377

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/191 (2.1.5)
            Distro/Arch: RHEL6.4/x86_64

            sanityn test 71 failed as follows:

            == sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432)
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s
            /usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s
              File: `/mnt/lustre2/f71'
              Size: 163840    	Blocks: 1          IO Block: 2097152 regular file
            Device: 2c54f966h/743766374d	Inode: 144115373078226305  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-05-27 14:00:32.000000000 -0700
            Modify: 2013-05-27 14:00:32.000000000 -0700
            Change: 2013-05-27 14:00:32.000000000 -0700
            34409
            fd: 3
            No unwritten extents, extents number 0, file size 0, original size 81920
             sanityn test_71: @@@@@@ FAIL: data is not flushed from client
            

            Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_2 server build: http://build.whamcloud.com/job/lustre-b2_2/17 (2.2.0)
            sanityn test 71 also failed:
            https://maloo.whamcloud.com/test_sets/5ebb0a36-c7c2-11e2-9f90-52540035b04c

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_3 server build: http://build.whamcloud.com/job/lustre-b2_3/41 (2.3.0)
            sanityn test 71 also failed:
            https://maloo.whamcloud.com/test_sets/f9fe3b92-c7f6-11e2-ae6b-52540035b04c

            Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            yujian Jian Yu added a comment - back port to b2_1 http://review.whamcloud.com/6377 Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/191 (2.1.5) Distro/Arch: RHEL6.4/x86_64 sanityn test 71 failed as follows: == sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432) 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s /usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s File: `/mnt/lustre2/f71' Size: 163840 Blocks: 1 IO Block: 2097152 regular file Device: 2c54f966h/743766374d Inode: 144115373078226305 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-05-27 14:00:32.000000000 -0700 Modify: 2013-05-27 14:00:32.000000000 -0700 Change: 2013-05-27 14:00:32.000000000 -0700 34409 fd: 3 No unwritten extents, extents number 0, file size 0, original size 81920 sanityn test_71: @@@@@@ FAIL: data is not flushed from client Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_2 server build: http://build.whamcloud.com/job/lustre-b2_2/17 (2.2.0) sanityn test 71 also failed: https://maloo.whamcloud.com/test_sets/5ebb0a36-c7c2-11e2-9f90-52540035b04c Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_3 server build: http://build.whamcloud.com/job/lustre-b2_3/41 (2.3.0) sanityn test 71 also failed: https://maloo.whamcloud.com/test_sets/f9fe3b92-c7f6-11e2-ae6b-52540035b04c Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients.

            adilger Andreas Dilger added a comment - In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients.
            spitzcor Cory Spitz added a comment -

            Doesn't this bug affect 1.8.9-wc1 too?

            spitzcor Cory Spitz added a comment - Doesn't this bug affect 1.8.9-wc1 too?
            bogl Bob Glossman (Inactive) added a comment - back port to b2_1 http://review.whamcloud.com/6377

            FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block).

            SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches.

            adilger Andreas Dilger added a comment - FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block). SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches.

            A q-leap customer runs into this with lustre-1.8 as well. My personal recommendation is to disable fiemap ioctls at all for now (or even for ever). Is there anything else other than filefrag and coreutils using fiemap at all? For some reasons I haven't looked into Lustre development for some time, are there plans to support SEEK_HOLE/SEEK_DATA?

            Thanks,
            Венедикт

            aakef Bernd Schubert added a comment - A q-leap customer runs into this with lustre-1.8 as well. My personal recommendation is to disable fiemap ioctls at all for now (or even for ever). Is there anything else other than filefrag and coreutils using fiemap at all? For some reasons I haven't looked into Lustre development for some time, are there plans to support SEEK_HOLE/SEEK_DATA? Thanks, Венедикт

            Artem, I moved your comments into this new bug. The cp/FIEMAP problem was worked on in LU-2580, and at least this solved the problem with "cp" missing the cached extents on the client. It doesn't fix the problem with cached extents on other clients, as your change does, so I'd still like to get your patch landed. Some issues that came up when Oleg and I looked at the patch.

            Your patch is forcing the OST to always get the LCK_PR lock on FIEMAP_FLAG_SYNC will cause the client to drop its whole file cache (held under LCK_PR) in the common case of "write a file then copy it from the same client". The FIEMAP_FLAG_SYNC will already cause the client to flush its own cache via ioctl_fiemap->filemap_write_and_wait(), with the patches 4477 and 4659 from LU-2367 and LU-2286. It would be better to only set the OBD_FL_SRVLOCK flag if the client cannot match a [0,EOF] LCK_PR lock locally. That would avoid flushing the client cache in a common case (LCK_PR will match the local LCK_PW on the client and VFS filemap_write_and_wait() will ensure data is written to the OST), but still ensure correctness if the file was written by multiple clients.

            The other related issue that the patch doesn't fix is that Lustre FIEMAP does not return the cached extents in memory on the client if FIEMAP_FLAG_SYNC is not set. That isn't strictly needed to make "cp" correct, since it always uses FIEMAP_FLAG_SYNC, but it might affect other FIEMAP users. It is more difficult to know what the correct action is in this case, but at a minimum it should return the unwritten pages from the local OSC extent cache. Returning unwritten pages from remote client caches is considerably more difficult, and would be racy in any case so I'm not sure that is worthwhile. It would be great if you could look at this issue as well, but it should be fixed in a separate patch from your current one.

            adilger Andreas Dilger added a comment - Artem, I moved your comments into this new bug. The cp/FIEMAP problem was worked on in LU-2580 , and at least this solved the problem with "cp" missing the cached extents on the client. It doesn't fix the problem with cached extents on other clients, as your change does, so I'd still like to get your patch landed. Some issues that came up when Oleg and I looked at the patch. Your patch is forcing the OST to always get the LCK_PR lock on FIEMAP_FLAG_SYNC will cause the client to drop its whole file cache (held under LCK_PR) in the common case of "write a file then copy it from the same client". The FIEMAP_FLAG_SYNC will already cause the client to flush its own cache via ioctl_fiemap->filemap_write_and_wait(), with the patches 4477 and 4659 from LU-2367 and LU-2286 . It would be better to only set the OBD_FL_SRVLOCK flag if the client cannot match a [0,EOF] LCK_PR lock locally. That would avoid flushing the client cache in a common case (LCK_PR will match the local LCK_PW on the client and VFS filemap_write_and_wait() will ensure data is written to the OST), but still ensure correctness if the file was written by multiple clients. The other related issue that the patch doesn't fix is that Lustre FIEMAP does not return the cached extents in memory on the client if FIEMAP_FLAG_SYNC is not set. That isn't strictly needed to make "cp" correct, since it always uses FIEMAP_FLAG_SYNC, but it might affect other FIEMAP users. It is more difficult to know what the correct action is in this case, but at a minimum it should return the unwritten pages from the local OSC extent cache. Returning unwritten pages from remote client caches is considerably more difficult, and would be racy in any case so I'm not sure that is worthwhile. It would be great if you could look at this issue as well, but it should be fixed in a separate patch from your current one.
            pjones Peter Jones added a comment - http://review.whamcloud.com/#change,6127

            People

              niu Niu Yawei (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: