Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8953

ZFS-MDT 100% full. Request for verification of plan to fix

Details

    • Bug
    • Resolution: Done
    • Blocker
    • None
    • Lustre 2.5.3
    • None
    • Centos 6, Lustre from llnl chaos branch
    • 3
    • 9223372036854775807

    Description

      The MDT for one of our filesystems is full, and it's not possible to delete any files, rendering the filesystem unusable from the users point of view.

      It's possible to manually track files that could be deleted via fid to ZFS objects on the disk. But we haven't found a way to delete objects via zdb. A recovery procedure using something like that would probably be good to have if more people run in to this.

      Given that it's almost Christmas vacation, so lets keep this simple and low risk. I've thrown some more disks into the MDS. Given that the filesystem with problems looks like this:

      lustre-mdt0 ONLINE 0 0 0
      mirror-0 ONLINE 0 0 0
      mds9_sdm-mdt_fouo6_sdm ONLINE 0 0 0
      mds9_sdn-mdt_fouo6_sdn ONLINE 0 0 0
      mirror-1 ONLINE 0 0 0
      mds9_sdo-mdt_fouo6_sdo ONLINE 0 0 0
      mds9_sdp-mdt_fouo6_sdp ONLINE 0 0 0

      Would it be safe (and fix the problem) to expand it by adding another mirror?:

      zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt

      (This is probably the same issue as LU-8856, so feel free to merge them if it makes sense.)

      Attachments

        Activity

          [LU-8953] ZFS-MDT 100% full. Request for verification of plan to fix

          zino,

          I'm glad that worked for you. I'll close this bug seeing as you've completed your expansion of the MDT. If I'm mistaken that you need something else from this bug, please feel free to re-open.


          utopiabound

          utopiabound Nathaniel Clark added a comment - zino , I'm glad that worked for you. I'll close this bug seeing as you've completed your expansion of the MDT. If I'm mistaken that you need something else from this bug, please feel free to re-open. – utopiabound

          After having talked through the failure scenarios of shutting down the FS in this state we decided to do it after unmounting since it's the procedure you know works. Seems to have worked out without any failures I can detect so far. For the record this is what we did:

          umount lustre-mdt0/fouo6
          zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt
          mount -t lustre lustre-mdt0/fouo6 /mnt/lustre/local/fouo6

          zino Peter Bortas added a comment - After having talked through the failure scenarios of shutting down the FS in this state we decided to do it after unmounting since it's the procedure you know works. Seems to have worked out without any failures I can detect so far. For the record this is what we did: umount lustre-mdt0/fouo6 zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt mount -t lustre lustre-mdt0/fouo6 /mnt/lustre/local/fouo6
          zino Peter Bortas added a comment -

          Nathaniel,

          When do you think you could run that test?

          zino Peter Bortas added a comment - Nathaniel, When do you think you could run that test?

          zino,

          I know taking the FS down and growing the MDT will alleviate your issue. I think growing the MDT live will be okay, but I would want to double check (run a test locally) before I could bless that course of action.


          utopiabound

          utopiabound Nathaniel Clark added a comment - zino , I know taking the FS down and growing the MDT will alleviate your issue. I think growing the MDT live will be okay, but I would want to double check (run a test locally) before I could bless that course of action. – utopiabound
          pjones Peter Jones added a comment -

          Nathaniel

          Could you please advise?

          Thanks

          Peter

          pjones Peter Jones added a comment - Nathaniel Could you please advise? Thanks Peter
          zino Peter Bortas added a comment -

          Addendum: Our intention is to expand the pool without shutting down Lustre. Either way should be fine, but expanding it live and giving Lustre at least the chance of completing any outstanding operations feels like the more sound way. Please let us know if you disagree.

          zino Peter Bortas added a comment - Addendum: Our intention is to expand the pool without shutting down Lustre. Either way should be fine, but expanding it live and giving Lustre at least the chance of completing any outstanding operations feels like the more sound way. Please let us know if you disagree.

          People

            utopiabound Nathaniel Clark
            zino Peter Bortas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: