Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16160

take ldlm lock when queue sync pages

Details

    • 3
    • 9223372036854775807

    Description

      osc_queue_sync_pages() add osc_extent to osc_object's IO extent list without taking ldlm locks, and then it calls osc_io_unplug_async() to queue the IO work for the client.

      I think the IO extent should take ldlm locks while waiting in the IO work queue.

      Attachments

        Issue Links

          Activity

            [LU-16160] take ldlm lock when queue sync pages
            paf0186 Patrick Farrell added a comment - - edited

            The original corruption problem is described in LU-14541, though it is hard to read.
            LU-14541 adds the removing uptodate
            That fix was reverted due to SIGBUS/EIO, which was fixed in LU-16160, and LU-16160 puts the removing uptodate fix back.

            Here is the data corruption case:
            The reason we need to remove uptodate is like this:
            We have a sequence like this:
            READ_1_START <-- NODE1
            release_page() called
            WRITE[START,END] <-- NODE2
            READ_2_START <-- NODE1
            READ_1_END
            READ_2_END
            (the order of READ_1_END and READ_2_END is not important)

            So, READ_1 concurrent with WRITE, READ_2 concurrent with READ_1 but after WRITE

            It is read2 which gets stale data, not seeing the write. (Read1 also does not see the write, but that is OK since the write and read1 overlap in time.)

            Here's what happens:
            Memory pressure flushes page (calls releasepage) while READ_1 is pending
            Lustre must remove its knowledge of the page, since the page is about to be destroyed - calling cl_page_delete()
            But page is still in the mapping (We cannot remove pages from the mapping in releasepage - the kernel removes the page a moment later and it assumes + asserts the page is still in the mapping)

            Now the page is no longer tracked by Lustre, but it is still findable in the mapping until it is removed, which happens later.

            At this point, the write starts on another node.

            Lustre write does not flush this page because Lustre is not responsible for the page any more and cannot find it
            Write completes (read1 is still in 'gap' between releasepage and remove_from_mapping)
            Read2() starts and finds pages kept alive by read1
            Read1() finishes without data from write (ok)
            Read2() finishes without data from write (not OK)

            Read2 starts after the write ends, and reads from pages kept alive by Read1.

            If cl_page_delete removes uptodate, this problem is avoided. However, read_iter and mmap_fault return errors (EIO or SIGBUS) if a !uptodate page is found. That is why there is the seqlock wrapper to retry those

            paf0186 Patrick Farrell added a comment - - edited The original corruption problem is described in LU-14541 , though it is hard to read. LU-14541 adds the removing uptodate That fix was reverted due to SIGBUS/EIO, which was fixed in LU-16160 , and LU-16160 puts the removing uptodate fix back. Here is the data corruption case: The reason we need to remove uptodate is like this: We have a sequence like this: READ_1_START <-- NODE1 release_page() called WRITE [START,END] <-- NODE2 READ_2_START <-- NODE1 READ_1_END READ_2_END (the order of READ_1_END and READ_2_END is not important) So, READ_1 concurrent with WRITE, READ_2 concurrent with READ_1 but after WRITE It is read2 which gets stale data, not seeing the write. (Read1 also does not see the write, but that is OK since the write and read1 overlap in time.) Here's what happens: Memory pressure flushes page (calls releasepage) while READ_1 is pending Lustre must remove its knowledge of the page, since the page is about to be destroyed - calling cl_page_delete() But page is still in the mapping (We cannot remove pages from the mapping in releasepage - the kernel removes the page a moment later and it assumes + asserts the page is still in the mapping) Now the page is no longer tracked by Lustre, but it is still findable in the mapping until it is removed, which happens later. At this point, the write starts on another node. Lustre write does not flush this page because Lustre is not responsible for the page any more and cannot find it Write completes (read1 is still in 'gap' between releasepage and remove_from_mapping) Read2() starts and finds pages kept alive by read1 Read1() finishes without data from write (ok) Read2() finishes without data from write (not OK) Read2 starts after the write ends, and reads from pages kept alive by Read1. If cl_page_delete removes uptodate, this problem is avoided. However, read_iter and mmap_fault return errors (EIO or SIGBUS) if a !uptodate page is found. That is why there is the seqlock wrapper to retry those
            bzzz Alex Zhuravlev added a comment -

            panda probably you can recall exact scenario for this problem with Uptodate flag?

            bzzz Alex Zhuravlev added a comment - panda probably you can recall exact scenario for this problem with Uptodate flag?
            bzzz Alex Zhuravlev added a comment -

            bobijam do you still recall why Uptodate should be kept? can you please describe in details?

            bzzz Alex Zhuravlev added a comment - bobijam do you still recall why Uptodate should be kept? can you please describe in details?

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50202/
            Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: b0a6d4d08e19d06661deabdb7278f07662d8b6e8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50202/ Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: b0a6d4d08e19d06661deabdb7278f07662d8b6e8
            pjones Peter Jones added a comment -

            Looks like everything tracked under this ticket has landed for 2.16

            pjones Peter Jones added a comment - Looks like everything tracked under this ticket has landed for 2.16

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50202
            Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: f817fcd2c553c223fc89087f62c9d59517bb8e59

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50202 Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: f817fcd2c553c223fc89087f62c9d59517bb8e59

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49647/
            Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: b4da788a819f82d35b685d6ee7f02809c05ca005

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49647/ Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim Project: fs/lustre-release Branch: master Current Patch Set: Commit: b4da788a819f82d35b685d6ee7f02809c05ca005

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49541/
            Subject: LU-16160 revert: "llite: clear stale page's uptodate bit"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 84c9618190f9e3a526ce51dc4995fcfa3a9ed265

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49541/ Subject: LU-16160 revert: "llite: clear stale page's uptodate bit" Project: fs/lustre-release Branch: master Current Patch Set: Commit: 84c9618190f9e3a526ce51dc4995fcfa3a9ed265
            pjones Peter Jones added a comment -

            panda Patrick ported your attached patch to master and pushed it into gerrit, so we can compare and contrast both similar approaches. To that end, please can you confirm that nothing was altered during the porting? Thanks!

            pjones Peter Jones added a comment - panda  Patrick ported your attached patch to master and pushed it into gerrit, so we can compare and contrast both similar approaches. To that end, please can you confirm that nothing was altered during the porting? Thanks!

            "Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49647
            Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 371d94fe71b103c1712bf8f2b1bb1c026cf31de4

            gerrit Gerrit Updater added a comment - "Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49647 Subject: LU-16160 llite: SIGBUS is possible on a race with page reclaim Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 371d94fe71b103c1712bf8f2b1bb1c026cf31de4

            People

              bobijam Zhenyu Xu
              bobijam Zhenyu Xu
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: