Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1547

MDT remounted read-only, MDS hung, MDT corrupted

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Not a Bug
    • Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5)
    • Fix Version/s: None
    • Labels:
    • Environment:
      OS RHEL 5.5 cluster, MDT, OST on LVM volumes, SAN, storage HP XP24k
    • Severity:
      3
    • Rank (Obsolete):
      4000

      Description

      Our customer experienced MDT remounted read-only after MDS relocation from sklusp01b to sklusp01a cluster node.
      They also performed relocation of OSS services during the same time.
      When they noticed the RO status they tried to stop the MDS. The attempt to stop MDS was unsuccessful, the server got unresponsive and the other cluster node (sklusp01b) fenced the sklusp01a MDS server and took over the MDT, The sklusp01b was stopped after take-over and
      then they run fsck which ended with huge number of errors, The repair was unsuccessful. It ended with recreation of whole Lustre FS and restore from backup.
      Is it possible to determine the root cause from logs?

        Attachments

        1. fsck.out
          1.17 MB
        2. sklusp01a-messages
          191 kB
        3. sklusp01b-messages
          1.32 MB

          Issue Links

            Activity

              People

              • Assignee:
                niu Niu Yawei (Inactive)
                Reporter:
                hpsk HP Slovakia team
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: