Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15300

mirror resync can cause EIO to unrelated applications

Details

    Description

      I noticed that sometimes sanity-flr/200 hits "checksum error", here are some findings.

      first of all, checksum error is caused by incomplete preceding lfs mirror resync command (which doesn't return an error in some cases).

      in turn, EIO lfs hits is caused by AS_EIO flag on the corresponded mapping.

      AS_EIO is set because of ESTALE to OST_WRITE with incorrect layout version (client's version is smaller than one on OST).

      so far I've traced all this to the race between two processes:

      • lfs doing resync and changing layout generation
      • another process (say, multiop) doing regular write

      I will cite the logs in a subsequent comment.

      Attachments

        Issue Links

          Activity

            [LU-15300] mirror resync can cause EIO to unrelated applications

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55464/
            Subject: LU-15300 mdt: refresh LOVEA with LL granted
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 9287b2c34d3c7c4d94d9db3a5a622d89be31ec6b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55464/ Subject: LU-15300 mdt: refresh LOVEA with LL granted Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 9287b2c34d3c7c4d94d9db3a5a622d89be31ec6b

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55464
            Subject: LU-15300 mdt: refresh LOVEA with LL granted
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 272180003988ecd0d786392cd0fee1a800d5e336

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55464 Subject: LU-15300 mdt: refresh LOVEA with LL granted Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 272180003988ecd0d786392cd0fee1a800d5e336
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46413/
            Subject: LU-15300 mdt: refresh LOVEA with LL granted
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 13557aa86904376e48a5e43256d5c1ab32c1c2d6

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46413/ Subject: LU-15300 mdt: refresh LOVEA with LL granted Project: fs/lustre-release Branch: master Current Patch Set: Commit: 13557aa86904376e48a5e43256d5c1ab32c1c2d6

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46413
            Subject: LU-15300 mdt: refresh LOVEA with LL granted
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 69
            Commit: efbe0f63eff8a9a7b192607382f6859e3b0088b8

            adilger Andreas Dilger added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46413 Subject: LU-15300 mdt: refresh LOVEA with LL granted Project: fs/lustre-release Branch: master Current Patch Set: 69 Commit: efbe0f63eff8a9a7b192607382f6859e3b0088b8

            What's next for this issue?

            review and landing hopefully, the patch has been in local testing for months..

            bzzz Alex Zhuravlev added a comment - What's next for this issue? review and landing hopefully, the patch has been in local testing for months..
            cfaber Colin Faber added a comment -

            Hi bzzz 

            What's next for this issue?

            cfaber Colin Faber added a comment - Hi bzzz   What's next for this issue?
            nangelinas Nikitas Angelinas added a comment - +1 on master: https://testing.whamcloud.com/test_sets/32deb408-a813-480f-a6f3-1ae41c34ab56
            gerrit Gerrit Updater added a comment - - edited

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46580
            Subject: LU-15300 mdt: fetch LOVEA after LL
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2f868486c9ad6f884b682173705e93d68ab6385a

            gerrit Gerrit Updater added a comment - - edited "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46580 Subject: LU-15300 mdt: fetch LOVEA after LL Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2f868486c9ad6f884b682173705e93d68ab6385a

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: