Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3219

FIEMAP does not sync data or return cached pages

Details

    • 3
    • 7860

    Description

      Artem Blagodarenko added a comment - 27/Mar/13 7:16 AM

      We have bug, that looks like this.

      After coping of a file with 'cp' to another location in the same fs some empty gaps happened.

      Comparing the good to the bad file, the bad file has a 4096-byte segment of zeros at block offset 6799 of the file, and the last portion of the file (following the last even 1MB multiple) is also zeros. Looking at the bad file's OST object (stripe count is 1), the zero'd regions of the file are holes - no blocks allocated for those portions. The block allocations for the bad file go from 0-6798, block 6799 is missing, then 6800-8703, blocks 8704-8745 are missing. (8704 blocks is 34*1048576; the bad file ends at the last even 1MB boundary).

      The bad file is missing block 6799, and blocks 8704-8745

      Extents for good file

      EXTENTS:
      (ETB0):554244449, (0-127):554255616-554255743, (128-767):554257024-554257663, (768-1535):554257920-554258687, (1536-
      4607):554258944-554262015, (4608-5247):554262528-554263167, (5248-5375):554255744-554255871, (5376-5887):554263296-5
      54263807, (5888-6015):554255872-554255999, (6016-6143):554258688-554258815, (6144-6271):554262144-554262271, (6272-6
      527):554256000-554256255, (6528-6655):554262016-554262143, (6656-7039):554256256-554256639, (7040-7167):554262272-55
      4262399, (7168-7295):554263168-554263295, (7296-7423):554263936-554264063, (7424-7679):554256640-554256895, (7680-78
      07):554262400-554262527, (7808-7935):554263808-554263935, (7936-8063):554256896-554257023, (8064-8319):554257664-554
      257919, (8320-8447):554258816-554258943, (8448-8745):554264064-554264361

      Extents for broken file:

      EXTENTS:
      (0-6798):538994432-539001230, (6800-8703):539001232-539003135

      Lustre client version on this system is 2.2.

      http://review.whamcloud.com/#change,6127

      Attachments

        Issue Links

          Activity

            [LU-3219] FIEMAP does not sync data or return cached pages

            It looks like 2.3 server should not support this test.
            Should we add checks like check added to b1_8 to master?

            +       [[ $server_version -lt $(version_code 1.8.10) ]] &&
            +               skip "Need MDS version at least 1.8.10" && return
            +
            +       # Patch not applied to 2.2 and 2.3 branches
            +       [[ $server_version -ge $(version_code 2.2.0) ]] &&
            +       [[ $server_version -lt $(version_code 2.4.0) ]] &&
            +               skip "Need MDS version at least 2.4.0" && return
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - It looks like 2.3 server should not support this test. Should we add checks like check added to b1_8 to master? + [[ $server_version -lt $(version_code 1.8.10) ]] && + skip "Need MDS version at least 1.8.10" && return + + # Patch not applied to 2.2 and 2.3 branches + [[ $server_version -ge $(version_code 2.2.0) ]] && + [[ $server_version -lt $(version_code 2.4.0) ]] && + skip "Need MDS version at least 2.4.0" && return
            yujian Jian Yu added a comment - Lustre client: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) Lustre server: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0) sanityn test 71 failed: https://maloo.whamcloud.com/test_sets/08bb9120-14ff-11e3-ba63-52540035b04c
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited Patch for b1_8 http://review.whamcloud.com/#/c/6631/
            yujian Jian Yu added a comment - - edited

            Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584
            Patch for Lustre master branch: http://review.whamcloud.com/6585 (which is also needed on Lustre b2_4 branch)

            yujian Jian Yu added a comment - - edited Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue. Patch for Lustre b2_1 branch: http://review.whamcloud.com/6584 Patch for Lustre master branch: http://review.whamcloud.com/6585 (which is also needed on Lustre b2_4 branch)
            yujian Jian Yu added a comment -

            sanityn test 71 failed in the following interop combinations:

            Lustre 2.1.6 client + 2.1.5 server:
            https://maloo.whamcloud.com/test_sets/d8783b88-cc0d-11e2-9cc0-52540035b04c

            Lustre 2.1.6 client + 2.2.0 server:
            https://maloo.whamcloud.com/test_sets/e1c7d58e-cc23-11e2-9cc0-52540035b04c

            Lustre 2.1.6 client + 2.3.0 server:
            https://maloo.whamcloud.com/test_sets/3aca7922-cc09-11e2-9cc0-52540035b04c

            yujian Jian Yu added a comment - sanityn test 71 failed in the following interop combinations: Lustre 2.1.6 client + 2.1.5 server: https://maloo.whamcloud.com/test_sets/d8783b88-cc0d-11e2-9cc0-52540035b04c Lustre 2.1.6 client + 2.2.0 server: https://maloo.whamcloud.com/test_sets/e1c7d58e-cc23-11e2-9cc0-52540035b04c Lustre 2.1.6 client + 2.3.0 server: https://maloo.whamcloud.com/test_sets/3aca7922-cc09-11e2-9cc0-52540035b04c
            yujian Jian Yu added a comment -

            back port to b2_1
            http://review.whamcloud.com/6377

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/191 (2.1.5)
            Distro/Arch: RHEL6.4/x86_64

            sanityn test 71 failed as follows:

            == sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432)
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s
            /usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found
            1+0 records in
            1+0 records out
            40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s
              File: `/mnt/lustre2/f71'
              Size: 163840    	Blocks: 1          IO Block: 2097152 regular file
            Device: 2c54f966h/743766374d	Inode: 144115373078226305  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2013-05-27 14:00:32.000000000 -0700
            Modify: 2013-05-27 14:00:32.000000000 -0700
            Change: 2013-05-27 14:00:32.000000000 -0700
            34409
            fd: 3
            No unwritten extents, extents number 0, file size 0, original size 81920
             sanityn test_71: @@@@@@ FAIL: data is not flushed from client
            

            Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_2 server build: http://build.whamcloud.com/job/lustre-b2_2/17 (2.2.0)
            sanityn test 71 also failed:
            https://maloo.whamcloud.com/test_sets/5ebb0a36-c7c2-11e2-9f90-52540035b04c

            Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205
            Lustre b2_3 server build: http://build.whamcloud.com/job/lustre-b2_3/41 (2.3.0)
            sanityn test 71 also failed:
            https://maloo.whamcloud.com/test_sets/f9fe3b92-c7f6-11e2-ae6b-52540035b04c

            Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            yujian Jian Yu added a comment - back port to b2_1 http://review.whamcloud.com/6377 Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_1 server build: http://build.whamcloud.com/job/lustre-b2_1/191 (2.1.5) Distro/Arch: RHEL6.4/x86_64 sanityn test 71 failed as follows: == sanityn test 71: correct file map just after write operation is finished == 14:00:32 (1369688432) 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00997164 s, 4.1 MB/s /usr/lib64/lustre/tests/sanityn.sh: line 1903: facet_fstype: command not found 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00642454 s, 6.4 MB/s File: `/mnt/lustre2/f71' Size: 163840 Blocks: 1 IO Block: 2097152 regular file Device: 2c54f966h/743766374d Inode: 144115373078226305 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-05-27 14:00:32.000000000 -0700 Modify: 2013-05-27 14:00:32.000000000 -0700 Change: 2013-05-27 14:00:32.000000000 -0700 34409 fd: 3 No unwritten extents, extents number 0, file size 0, original size 81920 sanityn test_71: @@@@@@ FAIL: data is not flushed from client Maloo report: https://maloo.whamcloud.com/test_sets/09fcc1bc-c7ed-11e2-9f90-52540035b04c Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_2 server build: http://build.whamcloud.com/job/lustre-b2_2/17 (2.2.0) sanityn test 71 also failed: https://maloo.whamcloud.com/test_sets/5ebb0a36-c7c2-11e2-9f90-52540035b04c Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/205 Lustre b2_3 server build: http://build.whamcloud.com/job/lustre-b2_3/41 (2.3.0) sanityn test 71 also failed: https://maloo.whamcloud.com/test_sets/f9fe3b92-c7f6-11e2-ae6b-52540035b04c Lustre version check code needs to be added into sanityn test_71() on Lustre b2_1 branch to resolve the above interop issue.

            In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients.

            adilger Andreas Dilger added a comment - In theory this problem could affect Lustre 1.8, but the supported client configurations would not have fileutils that use FIEMAP. So far it has only been reported for SLES11 SP2, which doesn't work with 1.8 clients.
            spitzcor Cory Spitz added a comment -

            Doesn't this bug affect 1.8.9-wc1 too?

            spitzcor Cory Spitz added a comment - Doesn't this bug affect 1.8.9-wc1 too?
            bogl Bob Glossman (Inactive) added a comment - back port to b2_1 http://review.whamcloud.com/6377

            FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block).

            SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches.

            adilger Andreas Dilger added a comment - FIEMAP is also used by tar to handle sparse files. The original reason for adding it was for filefrag to be able to report file fragmentation on large files efficiently, instead of having to issue billions of RPCs (one for each block). SEEK_HOLE/SEEK_DATA are not available in the vendor kernels that we support. If you are interested in this, we'd be happy to accept patches.

            People

              niu Niu Yawei (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: