Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6768

Data corruption when write and truncate in parallel in a almost-full file system

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.6.0
    • None
    • Reproduced in a virtual machine using loop device as OSD-ldiskfs disk.
    • 3
    • 9223372036854775807

    Description

      In order to test the stability of the Lustre file system under continuous workload and extreme resource usage, I wrote a tool to write data to files continuously unless 'ENOSPC' occurs, then the tool will truncate and delete some old files to free the space and continue to write the files. Before the file is truncated, its content will be verified. The problem is that after running the tool for a while, the content of some file would be wrong.

      If the data is corrupted, it will fail with:
      yaft: main.cpp:81: void check_file_content(const std::string&): Assertion `rbuf.checkAt(pos)' failed.
      Aborted

      You can get the tool at:
      https://github.com/zhang-jingwang/yaft.git

      Attachments

        Issue Links

          Activity

            [LU-6768] Data corruption when write and truncate in parallel in a almost-full file system
            yujian Jian Yu added a comment -

            Hi Jay,
            The patch is only needed by server.

            yujian Jian Yu added a comment - Hi Jay, The patch is only needed by server.

            Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/15904
            Subject: LU-6768 lvfs: unmap reallocated blocks
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: c8448bce0ad13aeb65c48905e080ddb0c536fc91

            gerrit Gerrit Updater added a comment - Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/15904 Subject: LU-6768 lvfs: unmap reallocated blocks Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: c8448bce0ad13aeb65c48905e080ddb0c536fc91
            jaylan Jay Lan (Inactive) added a comment - - edited

            Is the patch needed by server or client, or both?
            Looks like a server patch.

            jaylan Jay Lan (Inactive) added a comment - - edited Is the patch needed by server or client, or both? Looks like a server patch.

            This may help us on LU-6925. Can we get a b2_5 back port? Thanks!

            jaylan Jay Lan (Inactive) added a comment - This may help us on LU-6925 . Can we get a b2_5 back port? Thanks!
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15593/
            Subject: LU-6768 osd: unmap reallocated blocks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bcef61a80ab4fa6cee847722184738ba4deeb971

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15593/ Subject: LU-6768 osd: unmap reallocated blocks Project: fs/lustre-release Branch: master Current Patch Set: Commit: bcef61a80ab4fa6cee847722184738ba4deeb971

            thanks for the report and testing. please, inspect the patch and help to move it forward.

            bzzz Alex Zhuravlev added a comment - thanks for the report and testing. please, inspect the patch and help to move it forward.

            I run the reproducer for 7 hours and it didn't fail after applying the patch, where it will fail in minutes without the patch, so I believe that the problem is fixed.

            jingwang Jingwang Zhang added a comment - I run the reproducer for 7 hours and it didn't fail after applying the patch, where it will fail in minutes without the patch, so I believe that the problem is fixed.

            Thanks for looking into this.

            I'm using CentOS 6.5 with kernel version 2.6.32.431.29.2. And I will try the fix and get back to you later.

            jingwang Jingwang Zhang added a comment - Thanks for looking into this. I'm using CentOS 6.5 with kernel version 2.6.32.431.29.2. And I will try the fix and get back to you later.

            Jingwang, would you mind to try http://review.whamcloud.com/15593 please?

            bzzz Alex Zhuravlev added a comment - Jingwang, would you mind to try http://review.whamcloud.com/15593 please?

            People

              bzzz Alex Zhuravlev
              jingwang Jingwang Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: