Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10888

'lctl abort_recovery' allow aborting recovery between MDTs

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • 9223372036854775807

    Description

      'lctl abort_recovery' doesn't abort recovery between MDTs, because unlike abort recovery on single MDT system which only fail unfinished operations, this may break system consistency, so as a tradeoff, Lustre chose consistency over availability. But there are two major causes if recovery between MDTs doesn't finish, the first is network issue, for this type, we can wait indefinitely for network to recover, while the second is software bug, which is difficult for user to fix manually on backend filesystem.

      Now lfsck is ready, which can fix inconsistency in the system. So we should provide an option to allow user to abort recovery between MDTs, and then fix inconsistencies.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              laisiyao Lai Siyao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: