Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8367

delete orphan phase isn't stated for multistriped file

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      problem discovered while testing a OST failovers. OST pool with 10 OST was created and striping with -1 assigned to it.
      half (even indexes) OST's have failed during create.
      object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

      That bug introduced as regression after LOV > LOD moving on MDT side.
      Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357

      Attachments

        Issue Links

          Activity

            [LU-8367] delete orphan phase isn't stated for multistriped file
            adilger Andreas Dilger made changes -
            Link New: This issue is related to ATM-2973 [ ATM-2973 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-7353 [ EX-7353 ]
            cfaber Colin Faber made changes -
            Link New: This issue is related to EX-7031 [ EX-7031 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50543/
            Subject: LU-8367 osp: remove unused fail_locs from sanity/27S,822
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e69eea5f60eec17ac32cea8d2a60768e0738a052

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50543/ Subject: LU-8367 osp: remove unused fail_locs from sanity/27S,822 Project: fs/lustre-release Branch: master Current Patch Set: Commit: e69eea5f60eec17ac32cea8d2a60768e0738a052

            "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50543
            Subject: LU-8367 osp: unused fail_locs from sanity-27S
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 188103c3203355df222e7b43a45e02405bd8fe4a

            gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50543 Subject: LU-8367 osp: unused fail_locs from sanity-27S Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 188103c3203355df222e7b43a45e02405bd8fe4a
            cfaber Colin Faber made changes -
            Link New: This issue is related to DDN-3766 [ DDN-3766 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-3761 [ DDN-3761 ]
            adilger Andreas Dilger made changes -
            Description Original: problem discovered while testing a OST failovers. OST poll with 10 OST was created and striping with -1 assigned to it.
            half (even indexes) OST's have failed during create.
            object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

            That bug introduced as regression after LOV > LOD moving on MDT side.
            Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357
            New: problem discovered while testing a OST failovers. OST pool with 10 OST was created and striping with -1 assigned to it.
            half (even indexes) OST's have failed during create.
            object creation was blocked in several places, sometimes after reserving an object on failed OST. In that case OSP threads was blocked to start a delete orphans due situation when allocation hold an some reserved objects and can't be release this reservation due blocking on waiting recovery on next assigned OST. Due some object allocations in parallel - MDT hit in situation when each failed OST have an own reserved object and objects allocation blocked by long time waiting a specially when all OSP timeouts (each obd_timeout) expired. It may need a large amount of time - half or full hour.

            That bug introduced as regression after LOV > LOD moving on MDT side.
            Original ticket is https://projectlava.xyratex.com/show_bug.cgi?id=18357
            adilger Andreas Dilger made changes -
            Link New: This issue is blocking EX-1053 [ EX-1053 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16425 [ LU-16425 ]

            People

              bzzz Alex Zhuravlev
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: