Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5724

IR recovery doesn't behave properly with Lustre 2.5

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.5.3
    • MDS server running RHEL6.5 running ORNL 2.5.3 branch with about 12 patches.
    • 2
    • 16076

    Description

      Today we experienced a hardware failure with our MDS. The MDS rebooted and then came back. We restarted the MDS but IR behaved strangely. Four clients got evicted but when the timer to completion got down to zero IR restarted all over again. Then once it got to the 700 second range the timer starting to go up. It did this a few times before letting the timer running out. Once the timer did finally get to zero the recovery state was reported as still being in recovery. It removed this way for several more minutes before finally being in a recovered state. In all it toke 54 minutes to recover.

      Attachments

        1. atlas-mds1.log
          668 kB
          James A Simmons
        2. atlas-tds-kernel-logs_20141229.tar.gz
          265 kB
          James A Simmons
        3. atlas-tds-oss1_recovery_lustre-log.1418679242.16958
          0.3 kB
          James A Simmons
        4. rhea513_kern_12292014.log
          482 kB
          James A Simmons
        5. rhea-rtr1_kern_12292014.log
          366 kB
          James A Simmons

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: