Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15643

do not loop on OI Scrub on same FID

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      OI Scrub should not re-trigger on the same FID repeatedly. This currently happens for various different reasons, where the same one or two FIDs are causing OI Scrub to be run, but this isn't useful to run Scrub multiple times, and it should just ignore those FIDs. For example, from LU-14831:

      [Tue Jul 6 03:29:22 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013fd7:0x10a7c:0x0]/1254916266 with flags 0x4a: rc = 0
      [Tue Jul 6 04:26:01 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1ada1:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 05:18:33 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1888c:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 06:12:26 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013fd7:0x10a7c:0x0]/1254916266 with flags 0x4a: rc = 0
      [Tue Jul 6 07:04:40 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1888c:0x0]/148637947 with flags 0x4a: rc = 0
      [Tue Jul 6 07:56:45 2021] Lustre: lustre-MDT0001: trigger OI scrub by RPC for [0x240013957:0x1ada1:0x0]/148637947 with flags 0x4a: rc = 0
      

      it looks like it has found 3 different FIDs and is looping, but similar situations have been hit in many different cases.

      What should be done is keep track of the FIDs that have triggered auto-scrub (in memory is probably enough), and not-retrigger auto-scrub on those same FIDs. This is useful regardless of what the root cause of the OI scrub looping is. If the "bad FID list" is kept in memory only, then there would still be an OI Scrub triggered when the MDS restarts, so the problem wouldn't be swept under the rug completely, but at least it wouldn't turn a small problem with one file into a major spike in server load and block access to files on the server.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: