Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8367

delete orphan phase isn't stated for multistriped file

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      problem discovered while testing a OST failovers. OST pool with 10 OST was created and striping with -1 assigned to it.
      half (even indexes) OST's have failed during create.
      object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

      That bug introduced as regression after LOV > LOD moving on MDT side.
      Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357

      Attachments

        1. test1
          3 kB
          Alexey Lyashkov

        Issue Links

          Activity

            [LU-8367] delete orphan phase isn't stated for multistriped file
            adilger Andreas Dilger made changes -
            Link New: This issue is related to ATM-2973 [ ATM-2973 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-7353 [ EX-7353 ]
            cfaber Colin Faber made changes -
            Link New: This issue is related to EX-7031 [ EX-7031 ]
            cfaber Colin Faber made changes -
            Link New: This issue is related to DDN-3766 [ DDN-3766 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-3761 [ DDN-3761 ]
            adilger Andreas Dilger made changes -
            Description Original: problem discovered while testing a OST failovers. OST poll with 10 OST was created and striping with -1 assigned to it.
            half (even indexes) OST's have failed during create.
            object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

            That bug introduced as regression after LOV > LOD moving on MDT side.
            Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357
            New: problem discovered while testing a OST failovers. OST pool with 10 OST was created and striping with -1 assigned to it.
            half (even indexes) OST's have failed during create.
            object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

            That bug introduced as regression after LOV > LOD moving on MDT side.
            Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357
            adilger Andreas Dilger made changes -
            Link New: This issue is blocking EX-1053 [ EX-1053 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16425 [ LU-16425 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.16.0 [ 15190 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-10336 [ LU-10336 ]

            People

              bzzz Alex Zhuravlev
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: