Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7938

Recovery on secondary OSS node stalled

Details

    • Bug
    • Resolution: Not a Bug
    • Blocker
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • lola
      build: 2.8 GA + patches
    • 3
    • 9223372036854775807

    Description

      Description

      Error happens during soak testing of build '20160324' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160324). DNE is enabled. MDTs had been formatted with ldiskfs, OSTs using zfs. MDS and OSS nodes are configured in HA active-active failover configuration. MDS nodes operated wiht 1 MDT per MDS, while OSSes running 4 OST per node.
      Nodes lola-4 and lola-5 form a HA cluster.

      Event history

      • 2016-03-29 07:48:04,307:fsmgmt.fsmgmt:INFO triggering fault oss_failover of node lola-4
        powercyle node
      • 2016-03-29 07:52:27,876:fsmgmt.fsmgmt:INFO lola-4 is up
      • 2016-03-29 07:53:09,557:fsmgmt.fsmgmt:INFO zpool import and mount of lola-4's OSTs complete
      • Recovery don't complete after recovery_time is zero
      • 2016-03-29 08:03 (approximately) Aborted recovery manually (lctl --device ... abort_recovery)

      Attached files:
      messages, console and debug log before (lustre-log-20160329-0759-recovery-stalled) and after recovery was aborted (lustre-log-20160329-0803-recovery-aborted)

      Attachments

        1. console-lola-5.log.bz2
          40 kB
          Frank Heckes
        2. console-lola-5.log-20160722.bz2
          59 kB
          Frank Heckes
        3. lola-4-console-20160415.bz2
          95 kB
          Frank Heckes
        4. lola-4-lustre-log-20160415-0420.bz2
          0.3 kB
          Frank Heckes
        5. lola-4-lustre-log-20160415-0430.bz2
          936 kB
          Frank Heckes
        6. lola-4-messages-20160415.bz2
          215 kB
          Frank Heckes
        7. lustre-log-20160329-0759-recovery-stalled.bz2
          0.3 kB
          Frank Heckes
        8. lustre-log-20160329-0803-recovery-aborted.bz2
          8 kB
          Frank Heckes
        9. lustre-log-lola-30-2016-07-25_0855-ost-recovery-stalled.bz2
          0.3 kB
          Frank Heckes
        10. lustre-log-lola-30-2016-07-25_0859-after-ost-recovery-aborted.bz2
          0.3 kB
          Frank Heckes
        11. lustre-log-lola-3-2016-07-25_0855-ost-recovery-stalled.bz2
          0.3 kB
          Frank Heckes
        12. lustre-log-lola-3-2016-07-25_0859-after-ost-recovery-aborted.bz2
          0.3 kB
          Frank Heckes
        13. lustre-log-lola-5-20160722_0434_ost_recovery_stalled.bz2
          0.3 kB
          Frank Heckes
        14. lustre-log-lola-5-20160722_0438_after_ost_recovery_aborted.bz2
          176 kB
          Frank Heckes
        15. lustre-log-lola-8-2016-07-25_0855-ost-recovery-stalled.bz2
          0.3 kB
          Frank Heckes
        16. lustre-log-lola-8-2016-07-25_0859-after-ost-recovery-aborted.bz2
          0.3 kB
          Frank Heckes
        17. messages-lola-5.log.bz2
          49 kB
          Frank Heckes
        18. messages-lola-5.log-20160722.bz2
          220 kB
          Frank Heckes
        19. ost-failover-setttings-20160726_0702
          16 kB
          Frank Heckes

        Activity

          People

            yong.fan nasf (Inactive)
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: