Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15961

umount shouldn't cause recovery abort

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      when server is in recovery simple umount forces recovery abort and then clients can get IO error. one practical example was seen with corosync/pacemaker when a failed server returns and HA decides to move service-in-recovery to the old node:

      Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 1 client reconnects
      Lustre: Failing over lustre-MDT0000
      LustreError: 6277:0:(ldlm_lib.c:2876:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
      Lustre: 6186:0:(ldlm_lib.c:2283:target_recovery_overseer()) recovery is aborted, evict exports in recovery
      Lustre: lustre-MDT0000: Recovery over after 0:03, of 1 clients 0 recovered and 1 was evicted.
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: