Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      OI Scrub should not re-trigger on the same FID repeatedly. This currently happens for various different reasons, where the same one or two FIDs are causing OI Scrub to be run, but this isn't useful to run Scrub multiple times, and it should just ignore those FIDs. For example, from LU-14831:

      [Tue Jul 6 03:29:22 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013fd7:0x10a7c:0x0]/1254916266 with flags 0x4a: rc = 0
      [Tue Jul 6 04:26:01 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1ada1:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 05:18:33 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1888c:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 06:12:26 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013fd7:0x10a7c:0x0]/1254916266 with flags 0x4a: rc = 0
      [Tue Jul 6 07:04:40 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1888c:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 07:56:45 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1ada1:0x0]/148637947 with flags 0x4a: rc = 0
      

      it looks like it has found 3 different FIDs and is looping, but similar situations have been hit in many different cases.

      What should be done is keep track of the FIDs that have triggered auto-scrub (in memory is probably enough), and not-retrigger auto-scrub on those same FIDs. This is useful regardless of what the root cause of the OI scrub looping is. If the "bad FID list" is kept in memory only, then there would still be an OI Scrub triggered when the MDS restarts, so the problem wouldn't be swept under the rug completely, but at least it wouldn't turn a small problem with one file into a major spike in server load and block access to files on the server.

      Attachments

        Issue Links

          Activity

            [LU-15643] do not loop on OI Scrub on same FID

            I think this patch causes LU-16380

            bzzz Alex Zhuravlev added a comment - I think this patch causes LU-16380
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46852/
            Subject: LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 558784caad491be50e93ae60a31d4219a1e038bc

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46852/ Subject: LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 558784caad491be50e93ae60a31d4219a1e038bc
            hxing Xing Huang added a comment -

            2022-12-03: The patch passed Maloo tests and Janitor tests, and is ready to land to master.

            hxing Xing Huang added a comment - 2022-12-03: The patch passed Maloo tests and Janitor tests, and is ready to land to master.
            hxing Xing Huang added a comment -

            2022-11-26: The patch passed Maloo tests and Janitor tests, and is being reviewed.

            hxing Xing Huang added a comment - 2022-11-26: The patch passed Maloo tests and Janitor tests, and is being reviewed.

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48940
            Subject: LU-15643 test: collect log for sanity-scrub test_8
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cc960131852b668238d251979bb42313ee94f0c9

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48940 Subject: LU-15643 test: collect log for sanity-scrub test_8 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cc960131852b668238d251979bb42313ee94f0c9
            hxing Xing Huang added a comment -

            2022-10-22: The patch passed Maloo tests, and is being worked on to address Janitor failures.

            hxing Xing Huang added a comment - 2022-10-22: The patch passed Maloo tests, and is being worked on to address Janitor failures.
            hxing Xing Huang added a comment - - edited

            2022-09-10: Oleg confirmed that the fix patch introduces 100% failure in sanity-scrub tests

            hxing Xing Huang added a comment - - edited 2022-09-10: Oleg confirmed that the fix patch introduces 100% failure in sanity-scrub tests

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46852
            Subject: LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 65e10ed5237d1604c0bc75ec0e116989a80554d0

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46852 Subject: LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 65e10ed5237d1604c0bc75ec0e116989a80554d0

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: