[LU-8953] ZFS-MDT 100% full. Request for verification of plan to fix Created: 19/Dec/16  Updated: 20/Dec/16  Resolved: 20/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Peter Bortas Assignee: Nathaniel Clark
Resolution: Done Votes: 0
Labels: None
Environment:

Centos 6, Lustre from llnl chaos branch


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The MDT for one of our filesystems is full, and it's not possible to delete any files, rendering the filesystem unusable from the users point of view.

It's possible to manually track files that could be deleted via fid to ZFS objects on the disk. But we haven't found a way to delete objects via zdb. A recovery procedure using something like that would probably be good to have if more people run in to this.

Given that it's almost Christmas vacation, so lets keep this simple and low risk. I've thrown some more disks into the MDS. Given that the filesystem with problems looks like this:

lustre-mdt0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mds9_sdm-mdt_fouo6_sdm ONLINE 0 0 0
mds9_sdn-mdt_fouo6_sdn ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
mds9_sdo-mdt_fouo6_sdo ONLINE 0 0 0
mds9_sdp-mdt_fouo6_sdp ONLINE 0 0 0

Would it be safe (and fix the problem) to expand it by adding another mirror?:

zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt

(This is probably the same issue as LU-8856, so feel free to merge them if it makes sense.)



 Comments   
Comment by Peter Bortas [ 19/Dec/16 ]

Addendum: Our intention is to expand the pool without shutting down Lustre. Either way should be fine, but expanding it live and giving Lustre at least the chance of completing any outstanding operations feels like the more sound way. Please let us know if you disagree.

Comment by Peter Jones [ 19/Dec/16 ]

Nathaniel

Could you please advise?

Thanks

Peter

Comment by Nathaniel Clark [ 19/Dec/16 ]

zino,

I know taking the FS down and growing the MDT will alleviate your issue. I think growing the MDT live will be okay, but I would want to double check (run a test locally) before I could bless that course of action.


utopiabound

Comment by Peter Bortas [ 20/Dec/16 ]

Nathaniel,

When do you think you could run that test?

Comment by Peter Bortas [ 20/Dec/16 ]

After having talked through the failure scenarios of shutting down the FS in this state we decided to do it after unmounting since it's the procedure you know works. Seems to have worked out without any failures I can detect so far. For the record this is what we did:

umount lustre-mdt0/fouo6
zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt
mount -t lustre lustre-mdt0/fouo6 /mnt/lustre/local/fouo6

Comment by Nathaniel Clark [ 20/Dec/16 ]

zino,

I'm glad that worked for you. I'll close this bug seeing as you've completed your expansion of the MDT. If I'm mistaken that you need something else from this bug, please feel free to re-open.


utopiabound

Generated at Sat Feb 10 02:21:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.